|Speeding up LEX scanning times email@example.com (1995-02-02)|
|Re: Speeding up LEX scanning times firstname.lastname@example.org (1995-02-02)|
|Re: Speeding up LEX scanning times c1veeru@WATSON.IBM.COM (Virendra K. Mehta) (1995-02-02)|
|Re: Speeding up LEX scanning times email@example.com (Stefan Monnier) (1995-02-03)|
|Re: Speeding up LEX scanning times firstname.lastname@example.org (1995-02-03)|
|Re: Speeding up LEX scanning times email@example.com (1995-02-04)|
|Re: Speeding up LEX scanning times firstname.lastname@example.org (1995-02-07)|
|From:||email@example.com (Vern Paxson)|
|Keywords:||Cobol, parse, lex|
|Organization:||Lawrence Berkeley Laboratory, Berkeley CA|
|Date:||Sat, 4 Feb 1995 05:25:25 GMT|
Virendra K. Mehta <c1veeru@WATSON.IBM.COM> wrote:
> COBOL lexical analyzers can be pretty slow because of the very complicated
> regular expressions involved.... These generate a sizable dfa and can slow
> the scanner down a lot.
This is a common misconception (because it's so intuitive) regarding
regular expression matching when using DFA's. A key point when using DFA's
is that the run time is *independent* of the complexity of the RE. It's
instead just linear in the number of characters scanned, since the DFA
makes exactly one state transition per character, and figuring out which
transition to make takes constant time.
A couple of fine points: the more complex the RE, the more complex the
DFA (generally), so for really hairy RE's scanning will be a bit slower
due to paging in the DFA tables, cache misses, and the like. Also, it's
possible to have lex rules that require the scanner to scan further than
the end of the token, in order to ensure that the input doesn't match a
longer token. These sorts of rules result in scanning portions of the
input more than once, which slows down the scanner accordingly. They can
occur with both simple and complicated RE's. Flex has an option (-b) to
report when your rules have this property (they usually do) and provide
some guidance in modifying your rules to eliminate the need to back up.
Ironically, the way you eliminate the back up, and thus speed up your
scanner, is by adding more rules!
So basically do as much scanning as possible using RE's: with a proper
DFA implementation they're very fast.
Return to the
Search the comp.compilers archives again.