Sunday, January 13, 2013

So, I'm back to debugging what went wrong with Rush Hour. It turns out that in the show_wall function, R11 is being used without being saved first. When the function returns, execution goes off into the woods and we're hosed.

000007f0 :
     7f0:       02 01 46 00     li r1, >4600
     7f4:       d8 01 8c 02     movb r1, @>8C02
     7f8:       02 0b 40 b0     li r11, >40B0
# R11 used before being saved
     7fc:       d8 0b 8c 02     movb r11, @>8C02
     800:      06 cb               swpb r11
     802:      d8 0b 8c 00     movb r11, @>8C00
     806:      02 01 47 00     li r1, >4700

The problem seems to be that for some reason df_regs_ever_live_p() does not reflect the usage of R11 so we don't know that we should save it.

The sequence which uses R11 was added as part of a peephole replacement, and all other registers allocated this way are not marked as used either. Apparently, GCC tries to allocate an unused register in preference to reusing one previously allocated. Interesting.

I started looking into the peephole handling code, but realized that R11 was being allocated due being designated as volatile, and all other volatile register were already used. After thinking about this a bit, I can fix this much more easily by just changing R11 to be preserved across function calls.

Ugh, still have problems. So now GCC thinks R11 is preserved, and tries to put values there to be preserved across a function call. Obviously, those values are destroyed after the return. So R11 has instead been marked as a special-purpose register and will not be considered for allocation. I don't like this, since R11 could legitimately be used if only we knew when to preserve it. At least everything works now.

I also looked into the reported stack size problem, where stack values were overwriting memory used to save register values. The problem there was that there were several places in tms9900.c which calculated which registers were to be saved, and they did not all agree. This discrepancy resulted in space for R11 being included for the prologue and epilogue, but not included for the stack variables. The register value stored in the overlap region got destroyed as a result.

The work that was done for optimizing the prologue and epilogue also included merging all the work to determine which registers to save into one function. This improved code clarity, and removed the ambiguity which lead to this problem.

As a side note, there is a libiberty library in binutils, but it's not especially useful. In the bundled README file, it is stated that this is a collection of random useful functions. It also states that these routines are highly system-dependant. It's also filled with a bunch of routines which don't seem to be very useful on a target system. I guess I still need to finish my libc implementation.

So at this point, all the bug reports have been addressed, and all my milestone goals have been reached. I suppose this is a good time to put together another release.

No comments:

Post a Comment