24 Sep 1998 00:18:50 -0400

Related articles |
---|

Regular express to NFA using Thompson construction, questions, HELP ME heng@Ag.Arizona.Edu (1998-09-22) |

Re: Regular express to NFA using Thompson construction, questions mkjacobs@erols.com (Eric Jacobs) (1998-09-24) |

From: | Eric Jacobs <mkjacobs@erols.com> |

Newsgroups: | comp.compilers |

Date: | 24 Sep 1998 00:18:50 -0400 |

Organization: | Compilers Central |

References: | 98-09-102 |

Keywords: | lex |

Heng Yuan wrote:

*> I have some questions regarding regular express to NFA using*

*> Thompson construction.*

*>*

*> The Normal Thompson Construction for a((ab*)*)* :*

*> (a) node with an out edge labeled with a.*

*> (e) epsilon node*

*>*

*> Here is what I thought how the Thompson construction is made*

.

.

*> Is this how Thompson construction made? I found it quite difficult to*

*> code in C :(*

*>*

*> Here comes the second question:*

*> Can I do the following instead? with pseudo codes following charts.*

*> The method only lists once for the same situation, only results list in*

*> the repeated ones.*

.

.

You can do whatever works for you and your application, but I think

you may be a bit confused about how to represent an automaton with

a graph.

The NODES in the graph represents states of the automaton; they are

drawn using a circle. An EDGE represents a one-way transition from

one state to another; edges are repsented by arrows. Each edge may

be associated with either an input symbol or the epsilon (representing

no input).

One state is designated the start state; typically it is labeled "0".

One or more states may be designated final states (represented by

a double circle), meaning that the automaton simulation may (but

does not have to) stop there.

An automaton is called deterministic if there are no epsilon-

transitions, and also there is only one edge per input symbol on

each state. In other words, given a state and input symbol there is

only one possible move the automaton could make.

Thompson's construction builds a non-deterministic finite automaton

from a regular expression by combining basic forms into more complex

ones using simple transformations. It is not very efficient; it

oftens leaves a lot of epsilon transitions that can be eliminated

using an NFA-optimizing construction or subset construction.

ab

/---\ a /---\ b /===\

--> | | -----> | | -----> || ||

\---/ \---/ \===/

a|b

/---\ a /---\

+----> | | -----> | | -----+

| \---/ \---/ v

/---\ /---\ /===\

--> | | | | -----> || ||

\---/ \---/ \===/

| /---\ b /---\ ^

+----> | | -----> | | -----+

\---/ \---/

In the diagram, all unlabeled edges are epsilon transitions.

Thompson's construction essentially provides a systematic way

to build complex expressions from simpler ones. For example,

in the last example, "a" or "b" could be regular expressions

themselves. In that case, Thompson's would be substitute the

whole sub-expression into that branch of the |-operator.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.