Re: Speeding up LEX scanning times

eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig)
Tue, 7 Feb 1995 16:49:19 GMT

From comp.compilers

Related articles
Speeding up LEX scanning times pahint@eunet.be (1995-02-02)
Re: Speeding up LEX scanning times dimock@das.harvard.edu (1995-02-02)
Re: Speeding up LEX scanning times c1veeru@WATSON.IBM.COM (Virendra K. Mehta) (1995-02-02)
Re: Speeding up LEX scanning times monnier@di.epfl.ch (Stefan Monnier) (1995-02-03)
Re: Speeding up LEX scanning times mercier@hollywood.cinenet.net (1995-02-03)
Re: Speeding up LEX scanning times vern@daffy.ee.lbl.gov (1995-02-04)
*Re: Speeding up LEX scanning times eifrig@beanworld.cs.jhu.edu* (1995-02-07)**

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig)
Keywords:	parse, lex
Organization:	The Johns Hopkins University CS Department
References:	95-02-041 95-02-057
Date:	Tue, 7 Feb 1995 16:49:19 GMT

Virendra K. Mehta <c1veeru@WATSON.IBM.COM> wrote:
> COBOL lexical analyzers can be pretty slow because of the very complicated
> regular expressions involved.... These generate a sizable dfa and can slow
> the scanner down a lot.

Vern Paxson <vern@daffy.ee.lbl.gov> wrote:
>This is a common misconception (because it's so intuitive) regarding
>regular expression matching when using DFA's. A key point when using DFA's
>is that the run time is *independent* of the complexity of the RE. It's
>instead just linear in the number of characters scanned, since the DFA
>makes exactly one state transition per character, and figuring out which
>transition to make takes constant time.

Well, this is a good example of Moving Theory Into Practice
Axiom Number 1: "Not all constants are created equal."

The problem is that traversing a given arc in a DFA a *second* time
can be significantly faster than the first time, due to caching effects.
Reads from the cache are typically 2-3 times faster than reads from the
main memory. Typically, DFA acceptors are coded up as:

state = START_STATE;
while (!final_states[state]) {
token = input();
state = next_state[state][token];
}

(In practice, the "next_state" array is usually distributed as the jump
tables resulting from a bunch of "switch" statements, one for each state,
but the effect is the same.)

Reading from the "next_state" array is a significant part of the
time taken each pass through the loop. If the other parts of the loop
are fast, cache misses can have dramatic effects.

>So basically do as much scanning as possible using RE's: with a proper
>DFA implementation they're very fast.

But sometimes other techniques are faster. :-)

Jack Eifrig (eifrig@cs.jhu.edu) The Johns Hopkins University, C.S. Dept.
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Speeding up LEX scanning times

eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig)Tue, 7 Feb 1995 16:49:19 GMT

eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig)
Tue, 7 Feb 1995 16:49:19 GMT