Re: Looking for symbol table hashing algorithm

Paul Dietz <dietz@interaccess.com>
11 Jul 1998 23:40:49 -0400

          From comp.compilers

Related articles
[9 earlier articles]
Re: Looking for symbol table hashing algorithm preston@tera.com (1998-07-08)
Re: Looking for symbol table hashing algorithm drh@microsoft.com (Dave Hanson) (1998-07-08)
Re: Looking for symbol table hashing algorithm smith@aragorn.uni-trier.de (1998-07-10)
Re: Looking for symbol table hashing algorithm scott@basis.com (1998-07-10)
Re: Looking for symbol table hashing algorithm henry@spsystems.net (1998-07-10)
Re: Looking for symbol table hashing algorithm chase@naturalbridge.com (David Chase) (1998-07-10)
Re: Looking for symbol table hashing algorithm dietz@interaccess.com (Paul Dietz) (1998-07-11)
Re: Looking for symbol table hashing algorithm scottdaniels@earthlink.net (Scott David Daniels) (1998-07-13)
Re: Looking for symbol table hashing algorithm buzzard@world.std.com (1998-07-13)
Re: Looking for symbol table hashing algorithm peter.r.wilson@boeing.com (Peter Wilson) (1998-07-20)
| List of all articles for this month |

From: Paul Dietz <dietz@interaccess.com>
Newsgroups: comp.compilers
Date: 11 Jul 1998 23:40:49 -0400
Organization: The World's Usenet -- http://www.Supernews.com
References: 98-07-030 98-07-045 98-07-084
Keywords: symbols

David Chase wrote:


> What this means is that the bucket chains must get really long
> before chaining is costly, and with a grade A hash function, the
> odds of that happening are very low indeed. I lack the skill-time
> product to generally compute odds, but if you've got 16 bucket
> heads, the odds against adding 16 entries and landing them all in
> the same bucket are (I think) 2**60 to 1.




To make chaining have good expected time on any input set it suffices
to choose a random hash function such that the probability that two
different keys hash to the same location is <= 1/(table size). That's
because the time to insert n keys will be proportional to the
expectation of the sum of the n^2 random variables C(i,j), where
C(i,j) = 1 if keys i and j collide, and 0 if they do not. Expectation
of a sum of RVs is the sum of expectations, even if the RVs are not
independent.


Universal hash functions are covered in Cormen, Leiserson and Rivest.






> For scoped symbol tables, there's a stunt, I think due to Paul Dietz,
> that is useful in some applications.
[...]
> Rather than modify the symbol table to reflect the changing scopes
> as compilation moves through the program, you simply note which
> scope you are in; a reference to a symbol X takes the form "get
> scopes from symbol table for X, find current scope in set of
> scopes". Numbering the scopes with in and out numbers in a standard
> tree walk generates bracketing ranges that allow you to find the
> nearest enclosing scope in log N time (binary search).




What I did was solve this problem in the case of a tree that is
growing by addition of individual nodes, so one can't use a static
numbering. The original paper had a log* n extra factor, which Tarjan
immediately (I mean, seconds after the presentation) pointed out how
to avoid (see also the paper by Tsakalidis.)


You can do the lookup in O(loglogn) time on a RAM with O(logn) word
size using the van Emde Boas integer set data structure. Probably not
terribly useful practically.


Paul
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.