26 Apr 2001 21:15:17 -0400

Related articles |
---|

working with very large Finite State Machines sjbradtke@home.com (Steve Bradtke) (2001-04-22) |

Re: working with very large Finite State Machines vannoord@let.rug.nl (2001-04-26) |

Re: working with very large Finite State Machines chase@world.std.com (David Chase) (2001-04-26) |

Re: working with very large Finite State Machines rboland@unb.ca (Ralph Boland) (2001-04-26) |

Re: working with very large Finite State Machines rbates@southwind.net (Rodney M. Bates) (2001-04-29) |

From: | Ralph Boland <rboland@unb.ca> |

Newsgroups: | comp.ai.nat-lang,comp.compilers |

Date: | 26 Apr 2001 21:15:17 -0400 |

Organization: | University of New Brunswick |

References: | 01-04-117 |

Keywords: | lex |

Posted-Date: | 26 Apr 2001 21:15:17 EDT |

Steve Bradtke wrote:

*> I am working on a project that involves the construction of*

*> very large Finite State Machines (up to approximately 10^7 states).*

Do you have a corresponding regular expression (RE) for these

machines? If so you could then convert your regular expression into a

grammar and run a parser on it. (I strongly recommend a parser that

properly supports regular right part grammars since this can reduce

many problems with ambiguities). Now you are using a stack to

recognize your FSM/RE/Grammar and as a result the parse table can be

much smaller than the FSM.

This idea may get you to a much smaller machine but it has problems.

1) parsing is slower.

2) Converting your RE into a grammar introduces problems with among

other things ambiguities. Note that if the grammar is based upon a RE

then whether or not the grammar is ambiguous can be determined in

polynomial time. However even if the grammar is not ambigouus the

parser tool may not be able to disambiguate (?) the grammar because

of inadequate lookahead. (for some unambigouos REs arbitrarily large

lookahead is required)

In part because your FSM is so large the possibility of failure here

is large. But so are the potential savings in table size.

It would seem like an interesting project to build a tool that does

all of this automatically. Could be Masters or Ph.D. scale. Note

that the algorithm for testing if a RE is ambiguous (some accepted

strings can be accepted through more than one path through the RE) is

n*n*n*n i.e. n^4 where n is the size of the RE.

Hope this helps. Hope everything I said is correct.

One last note. I would think the best way to do this would be to

construct a minimized version of your finite state machine first as

this should help reduce problems with ambiguities (I think). Of

course this is exactly what you were trying to do in the first place.

Ralph Boland

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.