Optimal code sequences for signed int div/rem by powers of 2

S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa)
Wed, 21 Apr 1993 13:49:45 GMT

          From comp.compilers

Related articles
Optimal code sequences for signed int div/rem by powers of 2 S_JUFFA@IRAV1.ira.uka.de (1993-04-21)
Re: Optimal code sequences for signed int div/rem by powers of 2 markt@harlqn.co.uk (1993-04-23)
| List of all articles for this month |

Newsgroups: comp.arch,comp.compilers
From: S_JUFFA@IRAV1.ira.uka.de (|S| Norbert Juffa)
Keywords: arithmetic, optimize
Organization: University of Karlsruhe, FRG
Date: Wed, 21 Apr 1993 13:49:45 GMT

This article is somewhat of a followup to the recent discussion about
integer division here comp.arch. I am also cross-posting to comp.compilers
because this stuff may be relevant for people involved with compiler code
generation.


Many modern RISC CPUs do not have an integer division instruction (SPARC
V7, Alpha), have only a division step instruction (AMD 29K), or have a
rather slow HW-division (microSPARC). Even if there is a division
instruction, it may fail to produce a remainder (SPARC V8).


Therefore, people are looking for alternatives to using division. Fast
alternatives are especially feasible if the divisor is known at compile
time (e.g. multiplication by reciprocal of divisor). Quite a few posts in
the recent discussion were involved with speeding up division by powers of
two known at compile time. This is really easy when dealing with unsigned
integers (-> shift right), but requires some correction steps for signed
integers.


Checking compiler output for 80x86 produced by Microsoft C 7.0 and for
SPARC by gcc 2.2.3 and the Solaris cc compiler, I find they don't use all
the optimizations possible for signed integers when the divisor is a power
of two. Especially, when the divisor is a negated power of two (i.e.
-2^i), all of these compilers will resort to divide and remainder
instructions or subroutines, although the shifting and masking approach
used for unsigned integers could be used with a few correction steps. The
Microsoft C 7.0 compiler doesn't optimize any division/remainder by 2^i or
-2^i for signed integers.


I wonder what the fastest instruction sequences for signed integer
division by +/- 2^n are for different machines. I know from the discussion
there are even machines that have signed integer division by powers of two
in HW (i960).


I am including the routines I came up with for 80x86 and SPARC for doing
signed integer division/remainder by +/- 2^i.


Norbert Juffa (s_juffa@iravcl.ira.uka.de)


CODE FOR Intel 80x86 (can easily be changed to 32-bit version for >= 386)
=========================================================================


/ 2: CMP AX, 8000h ; CY = 1, if dividend >= 0
                                  SBB AX, -1 ; inc AX, if dividend < 0
                                  SAR AX, 1 ; now do right shift


/ 2^n: CWD ; DX = FFFFh if dividend < 0
                                  AND DX, 2^n-1 ; mask correction
                                  ADD AX, DX ; apply correction if necessary
                                  SAR AX, n ; now do right shift


/ -2: CMP AX, 8000h ; CY = 1, if dividend >= 0
                                  SBB AX, -1 ; inc AX, if dividend < 0
                                  SAR AX, 1 ; now do right shift
                                  NEG AX ; use (x div -2) = - (x div 2)


/ -2^n: CWD ; DX = FFFFh if dividend < 0
                                  AND DX, 2^n-1 ; mask correction
                                  ADD AX, DX ; apply correction if necessary
                                  SAR AX, n ; now do right shift
                                  NEG AX ; use (x div -2^n) = - (x div 2^n)


% 2, % -2: CWD ; generate flag, FFFFh id divd < 0, else 0
                                  AND AX, 1 ; mask out remainder
                                  XOR AX, DX ; negate
                                  SUB AX, DX ; remainder if dividend < 0


% 2^n, % -2^n: CWD ; generate mask, FFFFh if divd < 0, else 0
                                  AND DX, 2^n-1 ; mask correction
                                  ADD AX, DX ; apply pre-correction if necessary
                                  AND AX, 2^n-1 ; mask out remainder
                                  SUB AX, DX ; apply post-correction if necessary




CODE FOR SPARC
==============


Dividing a signed integer by a positive power of two (1<<n) known at
compile time.


1) for i/2: addcc %o1,%o1,%g0 ! carry if %o1 < 0
                                      addx %o1,%g0,%o1 ! inc %o1, if %o1 < 0
                                      sra %o1,1,%o1 ! do the shift


      Although this code sequence uses three instructions, just like what is
      produced by gcc now, it has the advantage that it doesn't need an
      additional register. It has the disadvantage of destroying the
      condition codes.




2) for i/(1<<n): sra %o1,31,%o2 ! %o2 = 0xffffffff if %o1 < 0
                                      srl %o2,32-n,%o2 ! (1<<n)-1, if %o1<0, else 0
                                      add %o1,%o2,%o1 ! apply correction
                                      sra %o1,n,%o1 ! do the shift


      The advantage of this sequence is that it doesn't use a branch and has
      no problem with n>=13 since it doesn't use immediate constants for the
      correction step, also it doesn't destroy the condition codes.


Dividing a signed integer by a negated positive power of two (-(1<<n))
should make use of the identity x / (-(1<<n)) = -(x / (1<<n)) and then
apply the code given above. Currently, gcc 2.2.3 generates calls to .div


1) for i/-2: addcc %o1,%o1,%g0 ! carry if %o1 < 0
                                      addx %o1,%g0,%o1 ! inc %o1, if %o1 < 0
                                      sra %o1,1,%o1 ! i / 2
                                      neg %o1 ! -(i / 2)


2) for i/(-(1<<n)):sra %o1,31,%o2 ! %o2 = 0xffffffff if %o1 < 0
                                      srl %o2,32-n,%o2 ! (1<<n)-1, if %o1<0, else 0
                                      add %o1,%o2,%o1 ! apply correction
                                      sra %o1,n,%o1 ! i / (1<<n)
                                      neg %o1 ! -(i / (1<<n))




Computing the remainder (% operator) of a division of a signed integer by
a positive power of two (1<<n) known at compile time. Currently, gcc uses
calls to .rem.


1) for i%2, i%(/-2):
                                      sra %o1,31,%o2 ! 0xFFFFFFFF, if %o < 0, else 0
                                      and %o1,1,%o1 ! mask out remainder
                                      xor %o1,%o2,%o1 ! negate remainder if quotient
                                      sub %o1,%o2,%o1 ! negative (sign(quot)=sign(rem)!)




2) for i%(1<<n), i%(-(1<<n)):
                                      sub %g0,1,%o2 ! 0xffffffff
                                      srl %o2,32-n,%o2 ! (1<<n)-1
                                      sra %o1,31,%o3 ! 0xffffffff, if %o1 < 0, else 0
                                      and %o3,%o2,%o3 ! (1<<n)-1, if %o1 < 0, else 0
                                      add %o1,%o3,%o1 ! apply correction if necessary
                                      and %o1,%o2,%o1 ! mask out remainder bits
                                      sub %o1,%o3,%o1 ! apply correction if necessary


      This instruction sequence is branch free, doesn't destroy the condition
      codes and has no problem with n>=13, since it doesn't use immediate
      constants in the correction step.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.