Re: Hash specifics

bart@cs.uoregon.edu (Barton Christopher Massey)
Thu, 13 Dec 90 20:12:11 GMT

          From comp.compilers

Related articles
[3 earlier articles]
Re: Hash specifics pardo@cs.washington.edu (1990-12-11)
Re: Hash specifics henry@zoo.toronto.edu (1990-12-12)
Re: Hash specifics baxter@zola.ICS.UCI.EDU (Ira Baxter) (1990-12-13)
Re: Hash specifics lupine!rfg@uunet.UU.NET (1990-12-13)
Re: Hash specifics rfg@ncd.com (1990-12-13)
Re: Hash specifics rfg@ncd.com (1990-12-13)
Re: Hash specifics bart@cs.uoregon.edu (1990-12-13)
Re: Hash specifics roth@resi.rice.edu (1990-12-14)
Re: Hash specifics oz@nexus.yorku.ca (1990-12-14)
Re: Hash specifics plx!plxsun!evan@Sun.COM (1990-12-15)
Re: Hash specifics vern@horse.ee.lbl.gov (1990-12-16)
Re: Hash specifics schmidt@oberkampf.ICS.UCI.EDU (Doug Schmidt) (1990-12-17)
Re: Hash specifics preston@ariel.rice.edu (1990-12-17)
[2 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: bart@cs.uoregon.edu (Barton Christopher Massey)
Keywords: design
Organization: Department of Computer Science, University of Oregon
References: <1990Dec12.221425.569@zoo.toronto.edu> <9012122107.aa06454@ICS.UCI.EDU>
Date: Thu, 13 Dec 90 20:12:11 GMT

In article <9012122107.aa06454@ICS.UCI.EDU> Ira Baxter <baxter@zola.ICS.UCI.EDU> writes:
> If you have a lexer independent of the parsing mechanism, so that it
> hasn't got any idea whether the next string is a keyword or not
> (assemblers scanning the opcode field violate this requirement), then
> you'll have to do a symbol/string table probe anyway


You can very efficiently decide whether the next input belongs to a given
set of keywords or is an identifier as you read it in using a DFA or some
similar recognizer. Given the wide availability of FLEX, this is the
approach I almost always use -- dodges both the problems of efficient
keyword hashing and separating identifiers from keywords.


I tend to agree with Henry Spencer that on modern machines, "imperfect
hashing" is likely to be good enough -- after all, how many keywords are we
talking about, anyway? A couple hundred, max? Let's just increase the
table size until some really cheap hash function doesn't generate any
collisions. How big a table will we end up with? A couple thousand
entries? That's "only" 8K of pointers on a 32-bit box... An interesting
exercise would be a program which, for a given set of keywords, and a given
table size limit, tried more and more expensive hash functions until the
keywords hashed without collisions into the table and then emitted the
function. Kind of like gperf with less lofty ambitions :-).


Bart Massey
bart@cs.uoregon.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.