8 Apr 1996 15:47:26 -0400

Related articles |
---|

Is Minimal Perfect Hashing the wrong algorithm? cef@geodesic.com (Charles Fiterman) (1996-04-02) |

Re: Is Minimal Perfect Hashing the wrong algorithm chase@centerline.com (1996-04-08) |

Re: Is Minimal Perfect Hashing the wrong algorithm bobduff@world.std.com (1996-04-08) |

Re: Is Minimal Perfect Hashing the wrong algorithm? dave@occl-cam.demon.co.uk (Dave Lloyd) (1996-04-08) |

Minimal Perfect Hashing and Cichelli's algorithm CSPT@giraffe.ru.ac.za (Pat Terry) (1996-04-10) |

From: | chase@centerline.com (David Chase) |

Newsgroups: | comp.compilers |

Date: | 8 Apr 1996 15:47:26 -0400 |

Organization: | CenterLine Software |

References: | 96-04-012 |

Keywords: | theory, symbols |

Charles Fiterman <cef@geodesic.com> () writes:

*> Many compiler writers use Minimal Perfect Hashing to generate some*

*> hash tables such as a key word table.*

*> But this algorithm minimizes table size. Shouldn't it minimize lookup*

*> time with a cap on table size? That is shouldn't it look at the time*

*> required to create a hash from an entry?*

Nowadays, whenever I need a hash table, any hash table, I use the one

that I wrote back in 1986. The reason for this is that there is no

hash table out there whose marginal benefit (in terms of speed,

assurance that it will work) exceeds that of this table. This is both

because I took the time to build it right in the beginning (lots of

profiling to figure out where the time went) and because it has been

so thoroughly used over the years.

Each hash table entry has a list of buckets hanging off of it. The

"hash" function is supposed to return a 32-bit number, which is also

stored in the bucket. (The hash table is parameterized by hash

function, compare function, copy function, and freeing function --

I've used it for everything from strings, to trees, to sets of trees,

to stack backtraces, to pointers.) The table index is derived from

this number, but it is also used for a rapid inequality check when

scanning the list of entries, and to allow more-rapid table

rearrangement if it must grow (not a problem for compiler keywords, of

course). Whenever there's a hit, it also does a swap-to-front on the

bucket list. (Resizing is automatic; the table monitors its

probe-to-call ratio, and if it goes bad, it resizes. The resizing

heuristic grows the table in one of two ways -- if the table is very

full and the probe ratio is bad, then it grows it a lot, but if the

table is not full and the probe ratio is bad, then it grows it just a

little, on the assumption that this will redistribute the buckets in

the offending busy table entries).

The rapid inequality check is an enormous win, at least judging from

the profiling results when I originally put this together (most times,

a lookup costs you one "hash", one "compare", a remainder operation, plus a

little bit of pointer-chasing). Swap-to-front turns out to be pretty

useful for many problems (temporal locality of reference) and not very

expensive for the rest. The one thing to beware of is failure to

provide a good hash function -- the design here assumes that the hash

function provided will have good statistical properties -- it should

act like a random number generator. The fast-inequality-check and the

table growth heuristics both depend on this.

I noodled around a little bit with perfect hashing once, but the

benefits didn't seem to be worth the effort.

speaking for myself,

David Chase

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.