Re: Pondering the future of lexical analysis

"Scott Nicol" <snicol@apk.net>
20 Oct 2002 22:50:28 -0400

          From comp.compilers

Related articles
Pondering the future of lexical analysis clint@0lsen.net (Clint Olsen) (2002-10-18)
Re: Pondering the future of lexical analysis jmcenerney@austin.rr.com (John McEnerney) (2002-10-20)
Re: Pondering the future of lexical analysis snicol@apk.net (Scott Nicol) (2002-10-20)
Re: Pondering the future of lexical analysis whopkins@alpha2.csd.uwm.edu (Mark) (2002-10-20)
Re: Pondering the future of lexical analysis arnold@skeeve.com (Aharon Robbins) (2002-10-20)
| List of all articles for this month |

From: "Scott Nicol" <snicol@apk.net>
Newsgroups: comp.compilers
Date: 20 Oct 2002 22:50:28 -0400
Organization: APK Net
References: 02-10-068
Keywords: lex
Posted-Date: 20 Oct 2002 22:50:28 EDT

> [I know there are Unicode versions of lex, such as the one from plan
> 9.


Not according to the docs: http://www.cs.bell-labs.com/magic/man2html/1/lex
(look under bugs)


> And yes, you only need to store valid transitions. One technique
> is to store the highest and lowest valid tokens in each state and a
> vector of transitions [lowest,highest]. -John]


Another technique, similar to the above, would be to use a 2-level table -
high-order 8 bits followed by low-order 8. If all the transitions (valid or
not) within high-8 are the same, code the transition. If there are
differences, refer to another table to deal with low-8. This has the
advantage of quick lookup (at most 2 array dereferences), but the
disadvantage of not being compatible beyond 16 bits (which is where Unicode
is heading).


--
Scott Nicol
snicol@apk.net


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.