Re: what scanner scheme is efficient?

Ram Bhamidipaty <ramb@primenet.com>
16 Oct 1996 17:57:50 -0400

          From comp.compilers

Related articles
what scanner scheme is efficient? lloix@star.spb.ru (1996-10-12)
Re: what scanner scheme is efficient? peter@bj-ig.de (1996-10-15)
Re: what scanner scheme is efficient? ramb@primenet.com (Ram Bhamidipaty) (1996-10-16)
Re: what scanner scheme is efficient? peter@peter.bj-ig.de (Peter Brueckner) (1996-10-18)
Re: what scanner scheme is efficient? roth@noel.cs.rice.edu (1996-10-20)
Re: what scanner scheme is efficient? jsgray@acm.org (Jan Gray) (1996-10-20)
Re: what scanner scheme is efficient? clark@quarry.zk3.dec.com (1996-10-24)
Re: what scanner scheme is efficient? james@wgold.demon.co.uk (James Mansion) (1996-10-24)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-10-30)
[6 later articles]
| List of all articles for this month |

From: Ram Bhamidipaty <ramb@primenet.com>
Newsgroups: comp.compilers
Date: 16 Oct 1996 17:57:50 -0400
Organization: Compilers Central
References: 96-10-041 96-10-051
Keywords: lex, performance

>Thus the question is simple, how do professionals design their scanners
>and tables, what do they do with keywords, and what is the more effective
>approach?


  > The best approch is to have an rule for 'IDENT' in the lexical
  > scanner and an stage IDENT->RESERVED_WORD after the call to the
  > scanner. This has a few advantages:
  > ...
  > The resword table is an simple hash (O(1) like the scanner


Perhaps someone could give more information on this point.


I recently implemented a scanner for netlist files.


For those who dont work in the VLSI/EDA field netlists are (very
large) files that describe the connectivity of a chip of some sort. It
may be that netlists have a different ratio of keywords to identifiers
from programming languages that would make my data less useful.


I tried the approach of using a single pattern to match both IDENT's
and all keywords followed by a hash table to perform the IDENT-->
keyword transformation. I found this to be _slower_ than just having
all the keywords recognized by the scanner. Note I did not make any
effort to chose some "perfect" hash function that would eliminate
collisions. My hash table had around 25 keywords and I used a table
size of 256 or so.


Is scanning/parsing _speed_ ever reason to go toward using a hash
table for identifying keywords? I would venture to say no: If one's
scanner has explicit patterns for each keyword then the "minimal"
amount of work is done to recognize the keyword itself, whereas with
the hash table approach, not only must the IDENT be recognized, but
now a hash value must be computed for _every_ word and then the hash
table probed, probably this last step is very cheap. I think the loss
in speed comes from having to compute a hash value even when the
scanner could know, if it had looked at the characters in the
identifier itself, that the word in question cannot be a keyword.


Does this line of reasoning make sense?


To me then one should use a hash table for kewords when there is some
other benefit that outweighs the loss in speed from the scanner.


-Ram
ram@epic.com
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.