Insomnia Labs

Sunday, April 1, 2012

I found this sequence which is being used to clear a byte of memory:

clr r2 # clocks: 10 bytes: 2
movb r2, *r1 # clocks: 4+14+6 = 24 bytes: 2
total: 34 clocks, 4 bytes

For indexed memory locations:

clr r2 # clocks: 10 bytes: 2
movb r2, @1(r1) # clocks: 4+14+8= 26 Bytes:4
total: 36 clocks, 6 bytes

I can do better than that using subract instructions:

sb *r1, *r1 # clocks: 4+14+6+6 = 30 bytes: 2
total: 30 clocks, 2 bytes

sb @1(r1), @1(r1) # clocks: 4+14+8+8 = 34 bytes: 6
total: 34 clocks, 6 bytes

The first form is about 12% faster, the second is about 6% faster. This isn't a huge improvement, but it's still better. This is probably best added as a peephole.

And now it's in there.

There's also this sequence, which I'm not happy about. It's the result of:

unsigned char a = (((unsigned char)val & (char)0x0F) + (char)'0');

mov r2, r5
swpb r5
srl r5, 8
andi r5, >F
ai r5, >30
swpb r5

I think this would be better:

mov r2, r5
andi r5, >F
ai r5, >30
swpb r5

So I've added an optimization for (int)X = (unsigned char)((int)X). This replaces:
mov r2, r5 # clocks:14
swpb r5 # clocks:10
sra r5, 8 # clocks:12+16
# total=52 clocks

with:
mov r2, r5 # clocks: 14
andi r5, >00FF # clocks: 14+4
# total=32 clocks

This is nearly twice as fast, and in the case where no MOV is needed, even faster. This makes me happy again.

Insomnia Labs

Sunday, April 1, 2012

No comments:

Post a Comment

Labels

Followers

Blog Archive

About Me