|^/$ in lexical analyzers firstname.lastname@example.org (1993-06-27)|
|Re: ^/$ in lexical analyzers email@example.com (Tom Lord) (1993-06-28)|
|Magic (was Re: ^/$ in lexical analyzers) firstname.lastname@example.org (1993-06-30)|
|From:||email@example.com (Richard L. Goerwitz)|
|Organization:||University of Chicago|
|Date:||Wed, 30 Jun 1993 13:45:13 GMT|
> [How are ^ and $ handled in regexp matchers?]
>There is a subtlety to this. In the dfa conversion, all the possible
>paths through the regexp nfa are explored in parallel -- breadth first
>fasion. But when characters like '^' and '$' are added it becomes
>possible that some paths will require checks for `^' while others don't.
The problems become particularly severe when you get a pattern like
^ab|ac. If you treat ^ as a special character of some kind (but a char-
acter nonetheless), then the DFA will have transitions out of the start
state on ^ and a. If we follow ^, though, on a pattern like "ac," we
will have to backtrack. Poof. There goes determinism.
There is a way to maintain the determinism. Just check, when creating
the DFA's start state (e-closure(NFA.start)), whether there is a tran-
sition on ^. If there is, then save the just-generated start state, and
then create another start state where ^ is equivalent to an epsilon
transition. I.e. take the e/^-closure of the NFA's start state. Use
this automaton when at the beginning of a line, and use the old "canoni-
cal" DFA start state when the start position is past the line's begin
point. This is very easy to implement, and doesn't require prepending
".*" to any portion of the regexp. It also doesn't require insertion of
any dummy newlines. The extra cost is that the canonical DFA start
state has to be constructed, even if it is not used. A single extra
state isn't too bad, especially since one could generate the new start
state for the beginning of the line by simply doing a union of ^-closure(
NFA.start) with the canonical start state.
>Watch for GNU rx, a regexp library based on this approach, to be released
This will be delightful. I'm a philologist by trade, and I find the stuff
you computer people do to be just short of magic :-).
-Richard L. Goerwitz firstname.lastname@example.org
Return to the
Search the comp.compilers archives again.