Sat, 15 May 2010 16:25:49 +0100

Related articles |
---|

NFA and negation/conjunction monnier@iro.umontreal.ca (Stefan Monnier) (2010-05-13) |

Re: NFA and negation/conjunction klyjikoo@gmail.com (klyjikoo) (2010-05-14) |

Re: NFA and negation/conjunction sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-05-15) |

Re: NFA and negation/conjunction monnier@iro.umontreal.ca (Stefan Monnier) (2010-05-15) |

Re: NFA and negation/conjunction klyjikoo@gmail.com (klyjikoo) (2010-05-16) |

Re: NFA and negation/conjunction cfc@shell01.TheWorld.com (Chris F Clark) (2010-05-16) |

Re: NFA and negation/conjunction torbenm@diku.dk (2010-05-17) |

Re: NFA and negation/conjunction klyjikoo@gmail.com (klyjikoo) (2010-05-19) |

Re: NFA and negation/conjunction rsc@swtch.com (Russ Cox) (2010-05-19) |

[1 later articles] |

From: | Stephen Horne <sh006d3592@blueyonder.co.uk> |

Newsgroups: | comp.compilers |

Date: | Sat, 15 May 2010 16:25:49 +0100 |

Organization: | virginmedia.com |

References: | 10-05-078 |

Keywords: | lex, DFA |

Posted-Date: | 16 May 2010 01:49:43 EDT |

On Thu, 13 May 2010 23:42:22 -0400, Stefan Monnier

<monnier@iro.umontreal.ca> wrote:

*>I'm looking for any work that tries to provide negation and/or*

*>conjunction operations in regexps without using a DFA (or at least*

*>while avoiding the DFA state explosion problem).*

I've done this, but my method *does* cause a combinatorial explosion

because, basically, one of the steps is to eliminate nondeterminism.

The next step is to minimise, though - but of course DFA minimisation

only achieves so much.

I've never checked how big the intermediate results get because, in

practice, my inputs just aren't that complex anyway.

I based what I do on what Adrian Thurstons Ragel does, at least as

described in the manual.

I'm assuming that conjunction means set intersection, and negation is

set negation. Rather the implement negation, I think the set

difference is more practical - you can always do "any* - input" for a

negation. A true negation is a slippery thing anyway - it all seems

sensible while you have one fixed set of input tokens, then someone

says "now lets update that regex to support unicode". A "true"

negation would need to support an infinite set of input tokens.

Anyway, the basic principle is simple enough. Merge the two input

NFAs. No states or transitions are added - you just renumber the

states in one then combine the state and transition tables.

Then, do a closure that removes all the nondeterminism - each state in

the result mapping to a set of states in the merged input.

Then, check each state. If a state has end states from both the first

input and the second input, for a set intersection, it is itself an

end state - otherwise not. For set difference, you look for states

that have input 1 end states but no input 2 end states.

Then you do dead state removal and minimisation.

Thats the simplified explanation of course - in practice, a lot of the

work can be built into the one modified convert-to-DFA algorithm, so

that e.g. a lot of dead states are never even created.

I'm sure this isn't new to anyone here.

What I'd like to know is - is there any such thing as an efficient

algorithm to minimise an NFA?

Obviously DFA minimisation algorithms can be applied to NFAs, but in

general they don't give a minimal result - only as minimal as can be

achieved by merging equivalent states.

Consider, for example, a six state DFA loop where states are end

states if their distance from the start is divisible by two or three.

A minimal NFA would have two separate loops - a two-state loop for the

divisible-by-two case and a three-state loop for the

divisible-by-three case. Automatically spotting things like this is

interesting because it involves splitting states apart (e.g. to make a

separate six-state loop for the divisible-by-two case) as well as

merging states. And the trouble with splitting states is that there's

not necessarily any clear indicator of when to stop.

I'm guessing that this is an NP-hard or NP-complete problem - am I

right?

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.