Thu, 6 Jul 1995 16:04:49 GMT

Related articles |
---|

integer mod with constant modulus. rocky@panix.com (R. Bernstein) (1995-06-24) |

Re: integer mod with constant modulus. mernst@theory.lcs.mit.edu (1995-06-29) |

HAKMEM #169 chase@centerline.com (1995-07-06) |

Re: HAKMEM #169 hbaker@netcom.com (1995-07-11) |

Re: HAKMEM #169 Tommy.Thorn@irisa.fr (1995-07-15) |

Newsgroups: | comp.compilers |

From: | chase@centerline.com (David Chase) |

Keywords: | arithmetic, comment |

Organization: | CenterLine Software |

References: | 95-06-047 95-07-028 |

Date: | Thu, 6 Jul 1995 16:04:49 GMT |

mernst@theory.lcs.mit.edu (Michael Ernst) writes:

*> This is a really neat technique which was described in HAKMEM and is used*

*> in the X11 sources to count the number of bits in a word (which is actually*

*> a modulus operation). It doesn't require any branches. On some*

*> architectures the final modulus may be expensive; it can be replaced by a*

*> few more shifts, masks, and adds in the obvious way.*

Actually, I'd be interested if you could name an architecture for

which the final modulus is not expensive. If I just code up

population count in the ordinary way (see below), I find it is faster

on all the different machine-compiler combiations that I have the

patience to try.

*> /**

*> * This is HAKMEM 169, as used in X11 sources.*

*> */*

*> static int*

*> t0_hackmemmod(register unsigned long n)*

*> {*

*> register unsigned long tmp;*

*>*

*> tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);*

*> return ((tmp + (tmp >> 3)) & 030707070707) % 63;*

*> }*

time in seconds

hakmem hakmem'*4 naive compiler & flags

HP-UX A.09.05 C 9000/710 19.6 8.3 6.9 c89 +O3

IBM Power something 8.2 5.1 4.0 xlc -O3

MicroSparc (50 Mhz) 18.6 10.8 8.8 acc 3.0.1 -fast -cg92 *2

MicroSparc (50 Mhz) 30.7 9.5 9.0 acc 3.0.1 -fast -cg89 *3

MicroSparc (50 Mhz) 29.9 8.9 8.0 gcc 2.5.8 -O3

SuperSparc (50 Mhz) 8.5 5.7 5.2 cc 3.0.1 -fast -xcg92

SuperSparc (50 Mhz) 21.1 5.7 5.4 cc 3.0.1 -fast -xcg89

Sparc ELC (Cypress, 33 Mhz) *1 16.6 15.6 cc 3.0.1 -fast -xcg92

Sparc ELC (Cypress, 33 Mhz) 52.6 16.7 15.6 cc 3.0.1 -fast -xcg89

Sparc ELC (Cypress, 33 Mhz) 47.7 10.8 8.3 gcc 2.6.3 -O3

Sparc ELC (Cypress, 33 Mhz) 49.5 13.4 12.3 cc 3.0.1 -fast -xO4 -xcg89

*1 I went to the bathroom and made tea, came back, and it still wasn't done.

The missing instruction is emulated by a trap to the OS.

*2 "-cg89" indicates scheduling and instruction selection tuned to run well

on the SparcStation 2. In particular, no hardware division.

*3 "-cg92" indicates sched. and inst. selection tuned to run well on the

SuperSparc and MicroSparc chips. In particular, support for division

in hardware.

*2 & *3 -- as a former Sun employee and compiler hacker, I am very familiar

with the compiler flags and versions. I am not so familiar with the ins and

outs of IBM and HP compilers. I also tried various GCC versions to test the

hypothesis that X11 and GCC were tuned for one another -- that is, GCC might

be loaded with optimizations tuned for code like what is in hakmem (and, in

fact, gcc optimizes the mask construction better than cc).

*4 For hakmem-prime, I performed the mod-63 operation with hand-written shifts,

masks, and additions. I may not have written it optimally, but it appears

to be correct (I tested it separately on a wide range of inputs, and of

course I worked it out first carefully by hand).

So, what do we learn from this:

- if this is really used in X11, it is a mistake. Why make use of obscure,

slow code when "straightforward" code is ALWAYS faster? (Yet more fodder

for the X11-haters.)

- it looks like it really would make sense to perform some constant remainders

with a sequence of shifts, masks, and additions. Hakmem-prime consistently

beat hakmem by a wide margin, and the only difference between the two was the

optimized constant-remainder.

- be wary of untested "great hacks", even when they come from a renowned

Institute of Technology.

Here's the test code:

--------------------------------------------------

#include <stdio.h>

#ifdef NAIVE

int

popcount(unsigned long n)

{

unsigned long x, y;

x = n & 0x55555555;

y = (n >> 1) & 0x55555555;

n = x + y;

x = n & 0x33333333;

y = (n >> 2) & 0x33333333;

n = x + y;

x = n & 0x0f0f0f0f;

y = (n >> 4) & 0x0f0f0f0f;

n = x + y;

x = n & 0x00ff00ff;

y = (n >> 8) & 0x00ff00ff;

n = x + y;

x = n & 0x0000ffff;

y = (n >> 16);

n = x + y;

return n;

*}*

#endif

#ifdef HAKMEM

int

popcount(unsigned long n)

{

unsigned long tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);

return ((tmp + (tmp >> 3)) & 030707070707) % 63;

*}*

#endif

#ifdef HAKMEM_P

int

popcount(unsigned long n)

{

unsigned long tmp, x, y;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);

tmp = ((tmp + (tmp >> 3)) & 030707070707);

/* Now take remainder mod 63 */

x = tmp & 07700770077;

y = (tmp >> 6) & 07700770077;

tmp = x + y; /* tmp contains 3 sums */

tmp = ((tmp >> 12) + (tmp >> 24) + tmp) & 0777;

tmp = (tmp >> 6) + (tmp & 077);

tmp = tmp + 1;

tmp = (tmp >> 6) + (tmp & 077) - 1;

return tmp;

*}*

#endif

main() {

int i;

int j = 0;

for (i = 0; i < 10000000; i++) {

j = j + popcount(i);

}

printf("j = %d\n", j);

*}*

--------------------------------------------------

speaking for myself,

David Chase

[I'm pretty sure it was a win on the PDP-6. -John]

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.