Tuesday, August 26, 2014

Apparently, division has problems too. Here are the failed scenarios:

X/10    -> Results in -1 for all negative X
X/(-10) -> Results in -1 for all negative X, 0 for all positive X
N/X     -> Internal compiler error for all N

The compiler error looks like it will be a pain to track down, so I'm looking at the incorrect math first.

Here's a simple function I'm using for testing, assume a=-100, b=10:

# int divide(int a, int b) {return a/b;}
divide:               # R1=0xFF9B R2=0xA
        mov  r1, r4   # R4=0xFF9B
        seto r3       # R3=0xFFFF
        jlt  $+>4
        clr  r3      
        div  r2, r3   # [R3,R4]/R2 = 0xFFFFFF9B/0xA
        mov  r3, r1   # R3 holds quotient
        b    *r11

If DIV does signed division, we should see a result of 0xFFF5 (-10)
If DIV only does unsigned division, we should have a result of 0x998F. For some reason we see 0xFFFF, weird.

Nope, not weird, I just missed a really important note in the data sheet: "... if the divisor is less that the data contained in Rd ... Rd and Rd+1 are unchanged." I think this is the second time I've really messed up division. Anyway, DIV apparently only does unsigned division, so I need to think about this a bit.

div_unsigned:         # return A/B, R1=A, R2=B
        clr  r0       # Clear high word of [R0,R1]
        div  r2, r0   # R0 = [R0,R1]/R2
        mov  r0, r1   # Move result to return position
        b    *r11     # Return result

Probably the best solution here is to find the sign of the quotient, convert both operands to their absolute value, divide, then set the sign of the result. No other approach I've looked at would be smaller or faster than this.

div_signed:           # return A/B, R1=A, R2=B
        mov r1, r3
        xor r2, r3    # Calculate sign of result
        abs r1        # Make arguments positive
        abs r2
        clr r0        # Clear high word
        div r2, r0    # R0 = [R0,R1]/R2
        inv r3
        jlt posval
        neg r0        # Negate result if necessary
posval: mov r0, r1    # Move result to return position
        b   *r11      # Return result
  22 bytes
       
OK, what about modulo? As of the C 1999 and C++ 2011 standards, the sign of the modulo must match that of the dividend.

Examples:
 103% 10 =  3
 103%-10 =  3
-103% 10 = -3
-103%-10 = -3

For The unsigned case, we can easily modify the division function by returning the modulo stored in R1. For the signed case, it gets trickier. We need to save the sign of the dividend somewhere and use it after the calculation is complete.

mod_unsigned:         # return A%B, R1=A, R2=B
        clr  r0       # Clear high word of [R0,R1]
        div  r2, r0   # R1 = [R0,R1]%R2, return value already in position
        b    *r11     # Return result

mod_signed:           # return A/B, R1=A, R2=B
        mov r1, r3    # Save unmodified dividend
        abs r1        # Make arguments positive
        abs r2
        clr r0        # Clear high word
        div r2, r0    # R1 = [R0,R1]%R2
        inv r3
        jlt posval
        neg r1        # Negate result if needed, already in position
posval: b   *r11      # Return result
  18 bytes

Any way to do a signed divmod, combining these two? Without lots of extra work?

divmod_signed:        # return A/B and A%B, R1=A, R2=B
        mov r1, r3
        xor r2, r3    # Calculate sign of quotient
        mov r1, r4    # Save unmodified dividend
        abs r1        # Make arguments positive
        abs r2
        clr r0        # Clear high word
        div r2, r0    # R0 = [R0,R1]/R2
        inv r3
        jlt posdiv
        neg r0        # Negate quotient if needed
posdiv: inv r4
        jlt posmod
        neg r1        # Negate modulus if needed
posmod: mov r0, r1    # Move result to return position
        b   *r11      # Return result
  30 bytes, 5 registers

This is really bulky and ugly, but ultimately better than two separate calls. About 25% smaller and maybe 75% faster. It might make sense to move this into a library call because if there is a lot of math, the code could get really big. The overhead for the function call seems fairly small too. need to do the math and get real numbers though.

No comments:

Post a Comment