Re: perfect hashing

Marc Tardif <intmktg@Gloria.CAM.ORG>
13 Aug 2000 19:21:37 -0400

          From comp.compilers

Related articles
[7 earlier articles]
Re: perfect hashing parz@home.com (Parzival) (2000-08-10)
Re: perfect hashing rick@home.com (2000-08-13)
Re: perfect hashing lars@bearnip.com (Lars Duening) (2000-08-13)
Re: perfect hashing pmk@cray.com (2000-08-13)
Re: perfect hashing tej@melbpc.org.au (Tim Josling) (2000-08-13)
Re: perfect hashing bob_jenkins@burtleburtle.net (2000-08-13)
Re: perfect hashing intmktg@Gloria.CAM.ORG (Marc Tardif) (2000-08-13)
| List of all articles for this month |

From: Marc Tardif <intmktg@Gloria.CAM.ORG>
Newsgroups: comp.compilers
Date: 13 Aug 2000 19:21:37 -0400
Organization: Compilers Central
References: 00-07-064 00-08-022 00-08-026 00-08-031 00-08-059
Keywords: symbols, comment

> >We could avoid the string comparison in many cases. We calculate an
> >additional hash value and store it in the perfect hash table as well
> >as the string itself. Then we need to compare the strings only if the
> >hash value matches.
>
> Another approach for detecting keywords which can be even more
> efficient is to just write a big multi-level switch statement
> where you test one character at a time. For example, if you have
> three keywords "the", "this", and "foo", you can use the following
> code:


Of course, there is always the storage/performance ratio which should
be considered. The more storage is used, the more likely performance
can potentially increase. Considering the suggested alternatives to
building a static symbol table, which only changes with new releases,
it seems the performance is in increasing order: (slowest) perfect
hashing < trie search < switch statements (quickest)


What about the storage requirements? Are they directly proportional in
all three cases? ie: (leanest) perfect hashing < trie search < switch
statements (bloatest)


How can the storage requirements for each of the alternatives be
approximated?
[I would be surprised if it made any practical difference in a real
compiler. In most cases, the largest chunk of time is spent breaking
the input up into tokens. Symbol lookup is generally down in the
noise unless you you do something really dumb like a broken hash that
puts everything into one bucket. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.