Re: perfect hashing

"Tzvetan Mikov" <ceco@jupiter.com>
5 Aug 2000 21:28:59 -0400

          From comp.compilers

Related articles
Hashtable alternatives gwynfa@paradise.net.nz (Gwynfa) (2000-07-27)
Re: Hashtable alternatives rkrayhawk@aol.com (2000-08-04)
perfect hashing preston@tera.com (Preston Briggs) (2000-08-04)
Re: perfect hashing ceco@jupiter.com (Tzvetan Mikov) (2000-08-05)
Re: perfect hashing jsgray@acm.org (Jan Gray) (2000-08-09)
Re: perfect hashing jmochel@foliage.com (2000-08-10)
Re: perfect hashing fjh@cs.mu.OZ.AU (2000-08-10)
Re: perfect hashing parz@home.com (Parzival) (2000-08-10)
Re: perfect hashing rick@home.com (2000-08-13)
Re: perfect hashing lars@bearnip.com (Lars Duening) (2000-08-13)
[4 later articles]
| List of all articles for this month |

From: "Tzvetan Mikov" <ceco@jupiter.com>
Newsgroups: comp.compilers
Date: 5 Aug 2000 21:28:59 -0400
Organization: Compilers Central
References: 00-07-064 00-08-022 00-08-026
Keywords: symbols

Preston Briggs <preston@tera.com> wrote in message
> The moderator wrote:
> [Perfect hashing is sometimes useful for a fixed table of keywords, but
> you can't use it for a normal symbol table. -John]
>
> Indeed, I wouldn't use it for keywords.
> Consider the typical scanner.
>
> a) Recognize that we have some sort of keyword or identifier
> b) See if it's a keyword
> c) If not, look it up in the symbol table
>
> If we use perfect hashing for step b, then we'll _always_ find
> some entry and have to finish with a string comparison.


We could avoid the string comparison in many cases. We calculate an
additional hash value and store it in the perfect hash table as well
as the string itself. Then we need to compare the strings only if the
hash value matches. Since the additional hash function is different
from the perfect hashing function we won't get many false positives
and the string comparison will just confirm the match.


Calculating the additional hash value is essentially free if the
symbol table for the identifiers uses the same hash function - it has
to be calculated anyway.


> [...]


> [I agree. I've fooled around with perfect hash generators, but in
practice
> it makes more sense to load the keywords into the same symbol table as the
> regular symbols so you can look up every text token in one try. -John]


I guess that would be faster, but it seems "cleaner" not to mix the
reserved words and the symbol table together. Going down this road in
a C compiler we could even use the same symbol table for preprocessor
defines as well ( has that been done?). Is that a good idea?


Tzvetan
[Seems pretty clear to put all the token info in one table. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.