Monday, March 12, 2012

I've been going through the compiled code looking for more opprotunities for optimizations. I found this sequence, which looks interesting:

(int y) = ((int)(charx))<<8
sra r2, 8 12+2*8+4=32
sla r2, >8 12+2*8+4=32
total: 64 clocks, 4 bytes

This could be replaced with:
andi r2, >FF00 14+8+4=26
total: 26 clocks, 4 bytes

more generally:
(int y) = ((int)(charx))<< N
sra r2, 8 12+2*8+4 = 32
sla r2, N 12+2*N+4 = 16+2*N
total: 48+2N clocks, 4 bytes

sla r2, N-8 12+2*N-2*8+4 = 2*N
andi r2, 0xFFFF< total: 26+2N clocks, 6 bytes

This looks pretty darn good, about 33% faster on average.
Truth table below:

N Original pattern Result Optimization
- ----------------- ----------------- ----
0 01234567.xxxxxxxx -> xxxxxxxx.01234567 swpb
1 01234567.xxxxxxxx -> xxxxxxx0.1234567x >>7
2 01234567.xxxxxxxx -> xxxxxx01.234567xx >>6
3 01234567.xxxxxxxx -> xxxxx012.34567xxx >>5
4 01234567.xxxxxxxx -> xxxx0123.4567xxxx >>4
5 01234567.xxxxxxxx -> xxx01234.567xxxxx >>3
6 01234567.xxxxxxxx -> xx012345.67xxxxxx >>2
7 01234567.xxxxxxxx -> x0123456.7xxxxxxx >>1
8 01234567.xxxxxxxx -> 01234567.xxxxxxxx nop
9 01234567.xxxxxxxx -> 1234567x.xxxxxxxx <<1
A 01234567.xxxxxxxx -> 234567xx.xxxxxxxx <<2
B 01234567.xxxxxxxx -> 34567xxx.xxxxxxxx <<3
C 01234567.xxxxxxxx -> 4567xxxx.xxxxxxxx <<4
D 01234567.xxxxxxxx -> 567xxxxx.xxxxxxxx <<5
E 01234567.xxxxxxxx -> 67xxxxxx.xxxxxxxx <<6
F 01234567.xxxxxxxx -> 7xxxxxxx.xxxxxxxx <<7

No comments:

Post a Comment