Re: symbol tables and search tries

Clint Olsen <clint@0lsen.net>
31 Jan 2006 00:35:34 -0500

          From comp.compilers

Related articles
symbol tables and search tries mr.waverlye@verizon.net (Mr.E) (2006-01-28)
Re: symbol tables and search tries haberg@math.su.se (2006-01-28)
Re: symbol tables and search tries gah@ugcs.caltech.edu (glen herrmannsfeldt) (2006-01-30)
Re: symbol tables and search tries haberg@math.su.se (2006-01-30)
Re: symbol tables and search tries anton@mips.complang.tuwien.ac.at (2006-01-31)
Re: symbol tables and search tries RLake@oxfam.org.uk (2006-01-31)
Re: symbol tables and search tries vmakarov@redhat.com (Vladimir Makarov) (2006-01-31)
Re: symbol tables and search tries clint@0lsen.net (Clint Olsen) (2006-01-31)
Re: symbol tables and search tries haberg@math.su.se (2006-01-31)
Re: symbol tables and search tries henry@spsystems.net (2006-01-31)
Re: symbol tables and search tries find@my.address.elsewhere (Matthias Blume) (2006-01-31)
Re: symbol tables and search tries danwang74@gmail.com (Daniel C. Wang) (2006-01-31)
Re: symbol tables and search tries mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-01-31)
Re: symbol tables and search tries DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-01-31)
[13 later articles]
| List of all articles for this month |

From: Clint Olsen <clint@0lsen.net>
Newsgroups: comp.compilers
Date: 31 Jan 2006 00:35:34 -0500
Organization: Compilers Central
References: 06-01-085 06-01-111 06-01-117
Keywords: symbols
Posted-Date: 31 Jan 2006 00:35:34 EST

On 2006-01-30, Hans Aberg <haberg@math.su.se> wrote:
> The hash tables I know of use a linked list on each hash value, and if
> there are k hash values (= number of linked list), the average search
> time is O(N/k) = O(N). By choosing the value k suitably relative to N,
> one can achieve fast searches by keeping the overhead down. If k is
> chosen poorly, the hash table must be rebuilt, in order to achieve this
> efficiency. Or are you using another type of hash tables? And for
> balanced trees, the insert time is logarithmic. If k is doubled when too
> low, perhaps insert time is logarithmic for hash tables too.


The average probe length for a hash table using chaining with N
elements distributed over M chains (buckets) is N/M. It is not O(N)
unless you do something naive like use a poor hash function or don't
maintain the load factor accordingly (for chains, allow the load
factor to vary between .5 and 2). Once the performance decays to
O(N), you are essentially using a linked list.


You don't need to choose M ahead of time. It's completely reasonable
to do a resize on the fly. This happens so infrequently that it's in
the noise. Doubling the number of chains for every resize is a
reasonable policy. This has been done in numerous free libraries and
works just fine (See Kaz's Kazlib or Vo's CDT from AT&T) for complete
working examples.


To respond to the original posters question, tries are only beneficial
if the keys are extremely long, and the hash function would dominate
the execution profile. Traditionally, symbols in compilers aren't all
that long (what was the original UNIX linker limit? John? [Eight]).
Tries also don't work well when the symbols have the same or similar
prefixes. You can fight this by changing the criteria for key
insertion into the trie. You can use the suffix of the key for
insertion if the prefixes are the same or similar. Or you can
alternate using characters from the prefix and suffix. Like a hash
function, a proper trie function is important, and you have to
consider the keys you're inserting in order to obtain good
performance.


Anyway, this is more of a data structure argument and has little to do
with compilers. See comp.programming for more discussion on this
topic.


-Clint


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.