Thursday, March 22, 2012

Here's a simplified version of the instructions up to step 182r.peephole2 for the "long" type.

inputs: r1,r2=val, r3=a_base (unused)
insn 24: r3=0 \_ Set temp 32-bit value to zero
insn 25: r4=r3 /
insn 28: r4=r2 <- Copy low word into temp value
insn 12: r4&=0x0F <- Apply mask
insn 13: r3(32-bit)+=48 <- '0'=48 <- Adding constant to low word
insn 14: r1=(char)r4
insn 15: number_buf[0]=r1
insn 33: return

GCC is claiming that R3 is unused (this is true), and so any manipulation of R3 can be safely removed. GCC also seems to be expanding this to R4, as it's part of the temporary 32-bit value defined in insn 24 and 25.

That results in these instructions being deleted in step 182:

insn 13
insn 12
insn 28
insn 25
insn 24

We are left with R4, with the note that it's in SImode (32-bit value), somehow the notion that R4 is the low word is lost, and R5 is selected for the low word of the value. This results in the final code:

print_hex
mov r5, r1
swpb r1
movb r1, @number_buf
b *r11

Similar behavior was seen with this code as well:
char test(long a) {return(a+'0');}

test
mov r5, r1
swpb r1
b *r11

Testing will continue with this example, since it's much shorter.

I found that the problem happens during register allocation. In all prior steps, the RTX looks good, but at 172r.ira, the wrong register is chosen, which makes a mess of everything from that point on.

168r.asmcoms
insn 3: (HI)((SI)r24)[0] = r1
insn 4: (HI)((SI)r24)[1] = r2
insn 9: (SI)r26=(SI)r24+48
insn 10: (QI)r27=(QI)((SI)r26)[3]
insn 16: (QI)r1=(QI)r27

172r.ira
insn 3: r3=r1
insn 4: r4=r2
insn 9: (SI)r3+=48
insn 10: (QI)r1=(SI)4 <- this is wrong, should be (SI)r3

No comments:

Post a Comment