Re: perfect hashing

jmochel@foliage.com (Jim Jackl-Mochel)
10 Aug 2000 00:09:31 -0400

          From comp.compilers

Related articles
Hashtable alternatives gwynfa@paradise.net.nz (Gwynfa) (2000-07-27)
Re: Hashtable alternatives rkrayhawk@aol.com (2000-08-04)
perfect hashing preston@tera.com (Preston Briggs) (2000-08-04)
Re: perfect hashing ceco@jupiter.com (Tzvetan Mikov) (2000-08-05)
Re: perfect hashing jsgray@acm.org (Jan Gray) (2000-08-09)
Re: perfect hashing jmochel@foliage.com (2000-08-10)
Re: perfect hashing fjh@cs.mu.OZ.AU (2000-08-10)
Re: perfect hashing parz@home.com (Parzival) (2000-08-10)
Re: perfect hashing rick@home.com (2000-08-13)
Re: perfect hashing lars@bearnip.com (Lars Duening) (2000-08-13)
Re: perfect hashing pmk@cray.com (2000-08-13)
Re: perfect hashing tej@melbpc.org.au (Tim Josling) (2000-08-13)
[2 later articles]
| List of all articles for this month |

From: jmochel@foliage.com (Jim Jackl-Mochel)
Newsgroups: comp.compilers
Date: 10 Aug 2000 00:09:31 -0400
Organization: Foliage Software Systems, Inc.
References: 00-07-064 00-08-022 00-08-026 00-08-031
Keywords: symbols

Preston Briggs <preston@tera.com> wrote in message
>> The moderator wrote:
>> [Perfect hashing is sometimes useful for a fixed table of keywords, but
>> you can't use it for a normal symbol table. -John]
>>
>> Indeed, I wouldn't use it for keywords.
>> Consider the typical scanner.
>>
>> a) Recognize that we have some sort of keyword or identifier
>> b) See if it's a keyword
>> c) If not, look it up in the symbol table
>>
>> If we use perfect hashing for step b, then we'll _always_ find
>> some entry and have to finish with a string comparison.


>, "Tzvetan Mikov" <ceco@jupiter.com> wrote:
>We could avoid the string comparison in many cases. We calculate an
>additional hash value and store it in the perfect hash table as well
>as the string itself. Then we need to compare the strings only if the
>hash value matches. Since the additional hash function is different
>from the perfect hashing function we won't get many false positives
>and the string comparison will just confirm the match.
>
>Calculating the additional hash value is essentially free if the
>symbol table for the identifiers uses the same hash function - it has
>to be calculated anyway.
>
>> [...]
>
>> [I agree. I've fooled around with perfect hash generators, but in
>practice
>> it makes more sense to load the keywords into the same symbol table as the
>> regular symbols so you can look up every text token in one try. -John]
>
>I guess that would be faster, but it seems "cleaner" not to mix the
>reserved words and the symbol table together. Going down this road in
>a C compiler we could even use the same symbol table for preprocessor
>defines as well ( has that been done?). Is that a good idea?
>
>Tzvetan
>[Seems pretty clear to put all the token info in one table. -John]


One thing you might want a look at is a data structure called a trie.
Tries have one property that distinguishes them from most other
retrieval data structures: they let you know sooner when something is
not a match.


So, for key-word table look-up, they return sooner when a the key word
is not matched.


They are a bit fat, so you may want a look at a paper called
"Multi-character Tries for Text Searching," (with L. K. D. Cooper),
Information Processing & Management, Vol. 29, No. 2, 1993, pp.197-207.


It documents a more compact representation.


Good Luck,
                Jim JM


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.