NFA -> DFA on-the-fly determinization

"Russ Cox" <rsc@swtch.com>
31 Jan 2006 21:19:27 -0500

          From comp.compilers

Related articles
NFA -> DFA on-the-fly determinization rsc@swtch.com (Russ Cox) (2006-01-31)
Re: NFA -> DFA on-the-fly determinization d148f3wg02@sneakemail.com (Karsten Nyblad) (2006-02-02)
Re: NFA -> DFA on-the-fly determinization rsc@swtch.com (Russ Cox) (2006-02-03)
| List of all articles for this month |

From: "Russ Cox" <rsc@swtch.com>
Newsgroups: comp.compilers
Date: 31 Jan 2006 21:19:27 -0500
Organization: Compilers Central
Keywords: lex, question
Posted-Date: 31 Jan 2006 21:19:27 EST

I am trying to find a reference for the technique of converting an NFA
to a DFA as needed during NFA execution and caching the result to
avoid repeating the conversion at each step.


Thompson's 7094 implementation represents each DFA state as a list of
NFA states (themselves function pointers) and computes the next state
afresh on each transition.


As a simple example, suppose you have an NFA for matching "a..". The
NFA has four states [] [a] [a.] and [a..] (I am using [] to mean the
state corresponding to reading the given string). The DFA has 15 or
so states: [] [a] [!a] [aa] [a!a] [!aa] [!a!a] [aaa] [aa!a] ...
(here, !a means a character that is not a). Each DFA state is a list
of NFA states: [aa!a] = ([], [a.], [a..]), [a!aa] = ([], [a], [a..]),
and so on.


If Thompson's algorithm is in the [aa!a] state and reads an a, it
executes the first list to produce the second list. The next time it
is in the [aa!a] state and reads an a, it will do this again. It does
not learn from past work. This is entirely reasonable given the
memory constraints at the time and the potentially large size.


It is a simple optimization to remember each state in some kind of
hash table and re-use previously-computed transitions. Assuming the
input only exercises a small part of the DFA, after you've primed the
DFA, each input character's processing is a single memory access.


In fact, this is the approach taken by Thompson's current grep, which
I posted a link to last month, but I expect that the fact that
NFA->DFA can be done incrementally has been well known for quite some
time.


With apologies for the long-winded explanation (I wanted it to be
clear why I wasn't asking for Thompson's 1968 paper), does anyone know
of any references to the original papers that pointed this out?


Thanks.
Russ


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.