Re: Regular express to NFA using Thompson construction, questions

Eric Jacobs <>
24 Sep 1998 00:18:50 -0400

          From comp.compilers

Related articles
Regular express to NFA using Thompson construction, questions, HELP ME heng@Ag.Arizona.Edu (1998-09-22)
Re: Regular express to NFA using Thompson construction, questions (Eric Jacobs) (1998-09-24)
| List of all articles for this month |

From: Eric Jacobs <>
Newsgroups: comp.compilers
Date: 24 Sep 1998 00:18:50 -0400
Organization: Compilers Central
References: 98-09-102
Keywords: lex

Heng Yuan wrote:
> I have some questions regarding regular express to NFA using
> Thompson construction.
> The Normal Thompson Construction for a((ab*)*)* :
> (a) node with an out edge labeled with a.
> (e) epsilon node
> Here is what I thought how the Thompson construction is made
> Is this how Thompson construction made? I found it quite difficult to
> code in C :(
> Here comes the second question:
> Can I do the following instead? with pseudo codes following charts.
> The method only lists once for the same situation, only results list in
> the repeated ones.

You can do whatever works for you and your application, but I think
you may be a bit confused about how to represent an automaton with
a graph.

The NODES in the graph represents states of the automaton; they are
drawn using a circle. An EDGE represents a one-way transition from
one state to another; edges are repsented by arrows. Each edge may
be associated with either an input symbol or the epsilon (representing
no input).

One state is designated the start state; typically it is labeled "0".
One or more states may be designated final states (represented by
a double circle), meaning that the automaton simulation may (but
does not have to) stop there.

An automaton is called deterministic if there are no epsilon-
transitions, and also there is only one edge per input symbol on
each state. In other words, given a state and input symbol there is
only one possible move the automaton could make.

Thompson's construction builds a non-deterministic finite automaton
from a regular expression by combining basic forms into more complex
ones using simple transformations. It is not very efficient; it
oftens leaves a lot of epsilon transitions that can be eliminated
using an NFA-optimizing construction or subset construction.


                /---\ a /---\ b /===\
        --> | | -----> | | -----> || ||
                \---/ \---/ \===/

                                  /---\ a /---\
                    +----> | | -----> | | -----+
                    | \---/ \---/ v
                /---\ /---\ /===\
        --> | | | | -----> || ||
                \---/ \---/ \===/
                    | /---\ b /---\ ^
                    +----> | | -----> | | -----+
                                  \---/ \---/

In the diagram, all unlabeled edges are epsilon transitions.
Thompson's construction essentially provides a systematic way
to build complex expressions from simpler ones. For example,
in the last example, "a" or "b" could be regular expressions
themselves. In that case, Thompson's would be substitute the
whole sub-expression into that branch of the |-operator.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.