Sunday, January 9, 2011

So, I'm trying to get better tuncations. Let's see what GCC is doing right now.

I'm looking at this line:
unsigned char a = (val & 0x0F) + '0';

Which eventually turns into this: (R1=val, R13=a)

mov r1, r3
andi r3, >F
ai r3, >30
mov r3, @14(r10)
movb @15(r10), r13

This is the RTL for the final "mov-movb" bit of this sequence.

128r.expand
(insn 38 37 0 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) -1 (nil))

133r.vregs
(insn 38 37 39 4 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (nil))

150r.bypass
(insn 38 37 39 3 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (expr_list:REG_DEAD (reg:HI 51)
(nil)))

This is unchanged till 171r.subregs_of_mode_init, which makes sense I suppose. At that point this gets split into the two move instructions. Unfortunately, there is nothing obvious in here which would prevent using memory for transfers.

Just for kicks, I removed the CANNOT_CHANGE_MODE_CLASS macro, let's see what happens. The fact that the TMSW9900 stores an 8-bit quantity in the high byte will be lost, but maybe I can trace what happens.

128r.expand
(insn 38 37 0 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) -1 (nil))

168r.asmcons
(insn 38 37 39 3 printf.c:13 (set (reg/v:QI 45 [ a ])
(subreg:QI (reg:HI 51) 1)) 68 {movqi} (expr_list:REG_DEAD (reg:HI 51)
(nil)))

172r.ira:
(insn 38 37 81 3 printf.c:13 (set (reg/v:QI 4 r4 [orig:45 a ] [45])
(reg:QI 4 r4 [orig:51+1 ] [51])) 68 {movqi} (nil))

Once we get to initial register allocation, this instruction has been transformed into a NOP, since GCC thinks the byte quantity will be left in the low byte of R4.

So after a lot of testing, and trial-and-error, I think I need to admit defeat on trying to get int-to-char working properly for 16-bit hard registers. GCC cannot be convinced to handle QI mode differently for the subregs. Depending on how I set up the test code, the point at which the hard subreg gets lost changes. This implies that that assumption is build into a lot of places. I might be able to optimize away conversions through the mov-movb sequence, but I need to do testing to make sure the stack will be suitably reduced, and that GCC doesn't try to reuse the value in memory later. Remember, there are no REG_DEAD flags for memory.

It's been a long day and I'm frustrated. No more for today.

No comments:

Post a Comment