Re: LEX behaviour when given "large" automata.

chris@mimsy.UUCP (Chris Torek)
20 Mar 88 05:30:17 GMT

          From comp.compilers

Related articles
LEX behaviour when given "large" automata. phs@lifia.imag.fr (1988-03-03)
Re: LEX behaviour when given "large" automata. rsalz@BBN.COM (Rich Salz) (1988-03-18)
Re: LEX behaviour when given "large" automata. chris@mimsy.UUCP (1988-03-20)
Re: LEX behaviour when given "large" automata. lbl-helios!vern@lbl-rtsg.arpa (Vern Paxson) (1988-03-18)
Re: LEX behaviour when given "large" automata. sargas.usc.edu!tli@oberon.usc.edu (1988-03-20)
| List of all articles for this month |

From: chris@mimsy.UUCP (Chris Torek)
Newsgroups: comp.compilers,comp.lang.c,comp.unix.questions
Date: 20 Mar 88 05:30:17 GMT
References: <911@ima.ISC.COM> <914@ima.ISC.COM>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742

Several people suggested replacing lex description such as


if return (IF);
then return (THEN);
else return (ELSE);
{id} { symenter(yytext); return (ID); }


with


{id} { /* first try it as a keyword */
k = keyword_index(yytext); if (k >= 0) return (k);
/* not a keyword, must be an identifier */
symenter(yytext); return (ID); }


This may (probably does) help in lex, but it is clearly the wrong way
around in almost all cases. The lexer has to traverse a DFA to decide
that something is an ID. While it is at it it could easily tell
whether it is a keyword instead. Obviously this makes the DFA table
larger, but the result should run faster.


It turns out (as Van Jacobson discovered) that lex uses a long
subroutine---on the order of 200 lines---to do what is essentially
described by the loop


/* follow the DFA */
for (state = begin; state[c].v == c; c = *in++)
state += state[c].n;


(best case) or


/*
* Pick the right start state depending on whether we are at
* the beginning of a line.
*/
state = newline ? hat_begin : begin;
while (state[c].v == c) {
/* follow the DFA */
state += state[c].n;
c = *in++;
/* remember action state (for right context) */
if (state->v) {
a_s = state;
a_c = in;
}
}
in = a_c;
if (a_s->n) {
in -= a_s->n; /* back up over right context */
c = in[-1];
}
newline = in[-2]=='\n'; /* remember whether we are now at ^ */
in[-1] = '\0'; /* kill residual */


(worst case). This pretty much handles everything except YYREJECT.
I do not know whether the LBL `flex' (Fast LEX) handles YYREJECT now.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.