|Syntax Highlighting and Lexical Analysis Dominic@tootedom.freeserve.co.uk (Dominic Tootell) (1999-09-11)|
|Re: Syntax Highlighting and Lexical Analysis email@example.com (jacob.navia) (1999-09-16)|
|Re: Syntax Highlighting and Lexical Analysis firstname.lastname@example.org (Armel) (1999-09-16)|
|Re: Syntax Highlighting and Lexical Analysis Marko.Makela@HUT.FI (Marko =?ISO-8859-1?Q?M=E4kel=E4?=) (1999-09-20)|
|Re: Syntax Highlighting and Lexical Analysis maratb@CS.Berkeley.EDU (Marat Boshernitsan) (1999-09-20)|
|Re: Syntax Highlighting and Lexical Analysis email@example.com (Quinn Tyler Jackson) (1999-10-04)|
|From:||Marat Boshernitsan <maratb@CS.Berkeley.EDU>|
|Date:||20 Sep 1999 11:58:16 -0400|
|Organization:||University of California at Berkeley|
"Dominic Tootell" <Dominic@tootedom.freeserve.co.uk> writes:
> I'm trying to building my own editor.
> I have never before done any lexical analysis type work, and I was wondering
> if anyone could point me in the correct direction. I know that you read the
> file in and produce a parse tree build on tokens (using a variation of the
> red black tree). The problem is how to I do about parsing the file,
> especially when commands can span lines, eg curly brackets and the like.
> If anyone can help me, or provide me with any information I will be most
> grateful. The kind of syntax highlighting I am looking for is the type that
> is done in emacs for C code or Java. I know that emacs uses an internal
> Lisp engine to read the code depending on a .el configuration file, but the
> thought of having to program a Lisp engine is scarry, and I have never
> bofore had any interaction with lisp.
The "really right" (and really general) way to do this is to use an
incremental lexer and relex at each keystroke. This lets you maintain
precise lexical information at all times and handle all possible cases
without having to craft complicated regexes.
One way to build an incremental lexer by simply driving a (possibly flex
generated) batch lexer is described in one of the chapters in Tim
Tim A. Wagner. Practical Algorithms for Incremental Software Development
Environments Ph.D. Dissertation, Report No. UCB//CSD-97-946
(the thesis also talks about how to build an incremental LALR(1) and GLR
This will handle any anal language you can ever describe with something
like flex (and it makes it easy to support many languages in one
editor); however if your language's lexical structure is simple, then
this is probably an overkill and regexes is the way to go.
 You can imagine that correctly highlighting something
like this would be rather difficult with regexes:
/* a funky comment
int y */
Return to the
Search the comp.compilers archives again.