Pondering the future of lexical analysis

"Clint Olsen" <clint@0lsen.net>
18 Oct 2002 23:41:03 -0400

          From comp.compilers

Related articles
Pondering the future of lexical analysis clint@0lsen.net (Clint Olsen) (2002-10-18)
Re: Pondering the future of lexical analysis jmcenerney@austin.rr.com (John McEnerney) (2002-10-20)
Re: Pondering the future of lexical analysis snicol@apk.net (Scott Nicol) (2002-10-20)
Re: Pondering the future of lexical analysis whopkins@alpha2.csd.uwm.edu (Mark) (2002-10-20)
Re: Pondering the future of lexical analysis arnold@skeeve.com (Aharon Robbins) (2002-10-20)
| List of all articles for this month |

From: "Clint Olsen" <clint@0lsen.net>
Newsgroups: comp.compilers
Date: 18 Oct 2002 23:41:03 -0400
Organization: AT&T Broadband
Keywords: lex
Posted-Date: 18 Oct 2002 23:41:03 EDT

I've been reading the Dragon Book lately about lexing, and after some
discussion with folks on the Flex team, the big hurdle in the future
will be the support of unicode - primarily due to the size of the
transition tables.


The Dragon book mentions that transitions should be defined for the
entire alphabet for each state, but this doesn't jive with what I've
seen in the diagrams. It seems like you only need to store _valid_
transitions in your tables, and even then you could store those as the
ranges as they were specified in your lexer specification. The
absense of a valid transition and being in a non-accepting state just
means you have no match, right?


Thanks,
-Clint
[I know there are Unicode versions of lex, such as the one from plan
9. And yes, you only need to store valid transitions. One technique
is to store the highest and lowest valid tokens in each state and a
vector of transitions [lowest,highest]. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.