|regular expressions (bug-report) email@example.com (1993-03-14)|
|Re: regular expressions (bug-report) Paul.Klint@cwi.nl (1993-03-15)|
|Re: regular expressions (bug-report) firstname.lastname@example.org (1993-03-15)|
|regular expressions (bug-report) email@example.com (Jos Horsmeier) (1993-03-16)|
|Re: regular expressions (bug-report) firstname.lastname@example.org (1993-03-19)|
|Re: regular expressions wendt@CS.ColoState.EDU (1993-03-22)|
|Re: regular expressions (bug-report) email@example.com (1993-03-25)|
|Re: regular expressions (bug-report) firstname.lastname@example.org (1993-03-26)|
|From:||email@example.com (David Keppel)|
|Organization:||Computer Science & Engineering, U. of Washington, Seattle|
|Date:||Fri, 19 Mar 1993 19:09:38 GMT|
>firstname.lastname@example.org (Dave Jones) writes:
>[This textbook regexp matcher has a bug...]
email@example.com (Mark A Biggar) writes:
>[Most Unix-based RE pkgs interpretive backtracking.]
An NFA package can, in principal, have at most as many states in the queue
as there are total states in the automaton. For an NFA, this number is
"reasonable" (e.g., linear in the size of the regular expression).
A DFA RE algorithm can represent one DFA state as a list of the
correspnding NFA states. A straightforward construction of the DFA from
the NFA, however, yields a huge DFA (the number of DFA states is, I think,
exponential in the number of NFA states).
`egrep', (and, in principle, `gnu?egrep') use a DFA-based mechanism. Al
Aho made several important observations. One is that you only *need* to
keep one DFA state at a time. On a transition, the next state can be
computed dynamically from the current state and a simulation of the
correspnding NFA states running on the NFA automaton. A second
observation is that the DFA can keep a cache of recently used DFA states
so that new states only need to be generated "rarely". A third
observation is that checking for a cache hit can be made cheap.
In principal, the running time of Aho's algorithm can be bad, since every
transition can require construction of a new DFA state. In practice, the
DFA traverses a limited number of states, and a small cache does a good
job. Combined with a Boyer-Moore prematcher, the cache mostly suffers
capacity misses (for common search patterns).
A related tool is `agrep', Udi Mamber's approximate string matching
algorithm. I don't know the details of the algorithm, but it does several
clever things very well. Source via anonymous ftp from sites that carry
linux, also (according to `archie') in
and, probably, lots of other places.
;-D on ( Unleaded expressions ) Pardo
Return to the
Search the comp.compilers archives again.