I think I figured out the word-to-byte conversion. Basically I'll be lying to GCC. Instead of 16 16-bit registers, I'm telling GCC we have 32 8-bit registers. This allows the truncate formats to work as expected. It turns out GCC doesn't know how to truncate a hard register, and apparently assumes the low order bits are always in the low byte, regardless of the mode that register is used with. That results in the wrong instructions being used.
There is another problem with optimization, though. the following program is reduced to a NOP when using -O2:
char func(int a)
looking at the debug output, it seems that the instructions are all removed by the point that the *.159r.combine file is generated. I'll need to look into this later.
I've been lurking the AtariAge forums for a while now and recently found the Editor/Assembler manual there. I wish I had this earlier, that would have make things a lot easier. All the scaps of information I had to reverse-engineer or infer from a bunch of places are listed in detail and in a pretty clean format. The only drawback is that it's a scanned copy, not OCR, so no text searches. Oh well.