Re: lexing backwards

Marat Boshernitsan <maratb@cs.berkeley.edu>
7 Apr 2003 00:26:02 -0400

          From comp.compilers

Related articles
lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-05)
Re: lexing backwards haberg@math.su.se (2003-04-07)
Re: lexing backwards cfc@world.std.com (Chris F Clark) (2003-04-07)
Re: lexing backwards maratb@cs.berkeley.edu (Marat Boshernitsan) (2003-04-07)
Re: lexing backwards stan@zaborowski.org (Stan Zaborowski) (2003-04-13)
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-04-13)
Re: lexing backwards monnier+comp.compilers/news/@rum.cs.yale.edu (Stefan Monnier) (2003-04-15)
Re: lexing backwards cfc@TheWorld.com (Chris F Clark) (2003-04-15)
Re: lexing backwards genew@mail.ocis.net (2003-05-06)
Re: lexing backwards Ron@Profit-Master.com (Ron Pinkas) (2003-05-14)
[4 later articles]
| List of all articles for this month |

From: Marat Boshernitsan <maratb@cs.berkeley.edu>
Newsgroups: comp.compilers
Date: 7 Apr 2003 00:26:02 -0400
Organization: Compilers Central
References: 03-04-015
Keywords: lex
Posted-Date: 07 Apr 2003 00:26:02 EDT

>Could anyone point me to work and experience on lexing text locally
>and backwards ?
>
>Traditionally lexing is done left-to-right with longest-match regexps.
>But such a lexing system obviously means that if you're in the middle
>of a file and want to know what is the previous lexing token, you end
>up (in the general case) having to lex from the beginning of the file.


Tim A. Wagner had done some work on optimal incremental lexing, which
permits you to start lexing from anywhere in the buffer and re-lexes
as few tokens as possible. His work is described in Chapter 5 of his
dissertation "Practical Algorithms for Incremental Software
Development Environments" [1] and also in an unpublished paper
"General Incremental Lexical Analysis" [2].


Wagner's algorithms were designed for use in an interactive program
development environments (Ensemble/Harmonia [3]) and were also used in
the CodeProcessor editor [4] for per-keystroke lexical analysis.
(This should serve as a testament for their applicability for program
editors.)


Space overhead of Wagner's algorithm is 3 integer fields per lexical
token (flex condition state, lookahead, and lookback). This overhead
could be reduced (or even eliminated) in most programming languages by
noticing that for majority of tokens these fields will have default
values. I have some ideas on how this can be implemented; feel free
to write me for more informations.


Marat.


[1]
http://www.cs.berkeley.edu/Research/Projects/harmonia/papers/twagner-thesis.pdf
[2]
http://www.cs.berkeley.edu/Research/Projects/harmonia/papers/twagner-lexing.pdf
[3]
http://www.cs.berkeley.edu/Research/Projects/harmonia/harmonia/index.html
[4] http://www.cs.berkeley.edu/~maratb/pubs/editor.pdf


--
Marat Boshernitsan
Ph.D. Candidate
Computer Science Division, EECS
University of California, Berkeley


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.