Saturday, November 6, 2010

After spellunking for months trying to get REG_DEAD notes into the compiled RTL, it turns out that they are not necessary anymore. Apparently this changes somewhere in the 3.X versions of GCC (I want to say 3.5, but I'm not sure about that. I read about this at work earlier, and I don't remember the details right now. Not really important now.)

I read a lot of posts from the GCC developers, and apparently, I shouldn't need to modify anything beyond the machine-dependant code to achieve everything I'm looking for. This is really good to know, since that should help reduce the time spent researching the GCC front end. Although, I'm kinda glad I did that work now.

So I'm going to implement the optimizations listed in September as peepholes. Should be pretty straightforward, really.

Repeating the optimization list from above:

Baseline:
mov Rx, Rx (14 cycles)

These all assume compared register will be dead
Compare to 2: dect G (10 cycles)
Compare to 1: dec G (10 cycles)
Compare to -1: inc G (10 cycles)
Compare to -2: inct G (10 cycles)

A<0 -> inv A; A>=0 (10 cycles) lt
A<=0 -> neg A; A>=0 (12 cycles) le
A==0 -> neg A; A==0 (12 cycles) eq x
A!=0 -> neg A; A!=0 (12 cycles) ne x
A>0 -> neg A; A<0 (12 cycles) gt
A>=0 -> inv A; A<0 (10 cycles) ge

lt (<)
le (<=)
eq (==)
ne (!=)
gt (>)
ge (>=)
ltu (< unsigned)
leu (<= unsigned)
gtu (> unsigned)
geu (>= unsigned)

I might not use the C pattern though..
Assume instructions are in slow mem, registers are fast

inct r1; inct r2 (4+10+1 + 4+10+1 = 30 cycles) %100
inc r1; inc r2 (4+10+1 + 4+10+1 = 30 cycles) %100

c *r1+, *r2+ (4+14 + 8+4 + 8+4 = 42 cycles) %140
cb *r1+, *r2+ (4+14 + 6+4 + 6+4 = 38 cycles) %126

So this form saves two bytes, but is about a third slower, and is difficult to induce. I think I'll pass on this.

No comments:

Post a Comment