Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers?

Roger L Costello <costello@mitre.org>
Sat, 25 Jun 2022 12:58:25 +0000

          From comp.compilers

Related articles
One lexer rule for keywords and identifiers christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-26)
| List of all articles for this month |

From: Roger L Costello <costello@mitre.org>
Newsgroups: comp.compilers
Date: Sat, 25 Jun 2022 12:58:25 +0000
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="78285"; mail-complaints-to="abuse@iecc.com"
Keywords: history, practice, comment
Posted-Date: 25 Jun 2022 12:41:29 EDT
Thread-Topic: Why don't compiler writers adhere to the dragon book recommendation of one lexer rule for keywords and identifiers?
Thread-Index: AdiIkmeY/cQZAc/NR0C+NL7PLIp4Bw==

Hi Folks,


Page 101-102 of the dragon book recommends having one lexer rule for both
keyword and identifiers (symtab = symbol table):


[a-zA-Z][a-zA-Z0-9]* { Action: check symtab for the lexeme.
If the symtab says the lexeme
is a keyword, return the keyword.
Else, if the symtab says the lexeme
is an existing identifier, return ID
and a pointer to the symtab entry.
Else, add the identifier to the symtab
and return ID and a pointer to the
new symtab entry. }


The book says the advantages of this approach are:


1. Easy to add new keywords. Simply initialize the symtab with the new
keywords. No changes to the lexer rules.


2. Smaller transition diagram. Instead of several hundred states, there are
less than a hundred states in the transition diagram.


Those seem like pretty compelling reasons for having one lexer rule for both
keywords and identifiers.


So why don't compiler (lexer) writers follow that recommendation?


Consider:


Page 86-90 of the book "flex & bison" shows the lexer for MySQL. The lexer has
a rule for every keyword.


Pag 228 of "Introduction to Compiling Techniques" show a lexer for VSL (Very
Simple Language). Again, the lexer has a rule for every keyword.


/Roger
[I think the answer to a lot of these questions contains the phrase "64K PDP-11." In
the 1970s a thousand state DFA took a lot of memory. Today it's barely a rounding error.
In flex&bison the main reason I did it the way I did is that it's a pedagogical example
but it also meant the symbol table didn't need special cases for keywords. There is also
the PL/I situation where keywords are only keywords in certain contexts so you can say
  IF IF = GOTO; THEN ELSE = BEGIN; ELSE END = IF;
-John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.