Size of hash tables was Re: Masters course with compiler specialization

Trevor.Jenkins@suneidesis.com (Trevor Jenkins)
22 Dec 2002 10:40:18 -0500

          From comp.compilers

Related articles
Masters course with compiler specialization jeremy.wright@microfocus.com (Jeremy Wright) (2002-11-12)
Re: Masters course with compiler specialization Trevor.Jenkins@suneidesis.com (2002-12-11)
Re: Masters course with compiler specialization etechweb@yahoo.com (2002-12-19)
Size of hash tables was Re: Masters course with compiler specializat Trevor.Jenkins@suneidesis.com (2002-12-22)
Re: Size of hash tables was Re: Masters course ... joachim_d@gmx.de (Joachim Durchholz) (2002-12-30)
Re: Size of hash tables was Re: Masters course ... neeri@iis.ee.ethz.ch (Matthias Neeracher) (2003-01-04)
Re: Size of hash tables was Re: Masters course ... bonzini@gnu.org (2003-01-04)
Re: Size of hash tables was Re: Masters course ... stephan@stack.nl (Stephan Eggermont) (2003-01-07)
| List of all articles for this month |

From: Trevor.Jenkins@suneidesis.com (Trevor Jenkins)
Newsgroups: comp.compilers
Date: 22 Dec 2002 10:40:18 -0500
Organization: suneidesis
References: 02-11-060 02-12-056 02-12-092
Keywords: symbols, practice
Posted-Date: 22 Dec 2002 10:40:18 EST

On 19 Dec 2002 12:41:25 -0500, Sebastiano Pilla <etechweb@yahoo.com> wrote:
> Trevor Jenkins <Trevor.Jenkins@suneidesis.com> wrote:
>
> > If Hopgood continues to teach this module then you'll at least come
> > out with the correct view that hash-tables based on the powers of 2
> > are better than Maurer's gospel that such tables can only ever be
> > absed on prime numbers.
>
> Could you please expand on that? I don't think I'll have the opportunity
> to attend Hopgood's course.


Since the publication of Maurer's paper "An improved hash code for
scatter storage" in the Comm of the ACM (vol 11, Jan 1968, pp 35--38)
it is taken as gospel that hash tables only work if the size is a
prime number. Ignoring for the moment any attempt to change the size
of a hash table the MODULO will require a divide operation. Divides
are (at least were) expensive operations to perform in hardware. For
the not so subtle influenceof this aper check out the Dragon books
where Aho/Ullman and Aho/Sethi/Ullman only mention the use of prime
numbers for the size of a hash table.


An alternative to prime numbers is to use powers of 2, which if I
remember Hopgood's lectures correctly were used in the Fortran II
compiler. Already this has become effecient. Why? The divide operation
can be replaced by the faster boolean AND operation. Plus there is no
serious penalty for increaing the size of the hash table. To increase
the size of a hash table using powers of 2 requires setting the next
bit along whereas with the Maurer doctrine you'll have to compute the
next prime number in sequence.


There are other benefits of using powers of two. Carefully planning
can make sure that the hash table starts on a page bondary and there
after an integral number of pages can be processed. Rehashing the
tabel will be faster for the same reason that inserting an individual
entry is faster.


For more information on Hopgood's approach see:


"The quadratic hash method when the table size is a power of 2"
Hopgood and Davenport, Computer Journal, vol 15, nov 1972, pp314--315


I do remember Hopgood saying that he wrtote the editor of Comm of the
ACM about Maurer's paper challenging the validity of the content. A
letter that so far remains unprinted --- even when Maurer's was
reprinted in a annivarsary issue of Comm of the ACM.


  Regards, Trevor


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.