9 Jan 2000 22:51:33 -0500

Related articles |
---|

Need regexp source balachandr@yahoo.com (2000-01-02) |

Re: Need regexp source arnold@mathcs.emory.edu (2000-01-06) |

Re: Need regexp source jos@and.nl (Jos A. Horsmeier) (2000-01-06) |

Re: Need regexp source george@castro.dbnet.ece.ntua.gr (2000-01-09) |

Re: Need regexp source roumazeilles.NO.SPAM@NO.SPAM.magic.fr (Yves Roumazeilles) (2000-01-09) |

Re: Need regexp source mottl@miss.wu-wien.ac.at (Markus Mottl) (2000-01-09) |

Re: Need regexp source jenglish@flightlab.com (Joe English) (2000-01-09) |

Re: Need regexp source ralph@inputplus.demon.co.uk (2000-01-15) |

Re: Need regexp source thp@roam-thp2.cs.ucr.edu (Tom Payne) (2000-01-15) |

Re: Need regexp source world!cfc@uunet.uu.net (Chris F Clark) (2000-01-19) |

From: | Joe English <jenglish@flightlab.com> |

Newsgroups: | comp.compilers |

Date: | 9 Jan 2000 22:51:33 -0500 |

Organization: | Society for the Abolition of Assignment Statements |

References: | 00-01-006 |

Keywords: | DFA, lex |

<balachandr@yahoo.com> wrote:

*>I am looking for a C source code for regexp which uses a iterative*

*>routine for pattern matching instead of a recursive routine. [...]*

and the moderator added:

*>[There's always lex, I suppose. Is there a reasonable way to do interative*

*>regex matching without all of the work of building a DFA first? -John]*

Why yes! As a matter of fact, a truly nifty regular expression

matching algorithm can be found on the comp.compilers file archive:

<URL: ftp://iecc.com/pub/file/regex.tar.gz >

See in particular Section 3 ("The Algorithm") of the file dfa.doc

("Converting Regular Expressions to Finite Automata"). Although the

article is -- as the title implies -- principally concerned with

constructing finite automata, the algorithm can easily be adapted --

simplified in fact -- to work as a regex matcher.

The basic idea is to think of a regular expression as the label of a

state in an (infinite) DFA. It's easy to determine if the RE

represents an accepting state based on its syntactic structure, and

the transition function 'delta :: (RE,Symbol) -> RE' is likewise easy

to compute. Clever use of memoization leads to a very efficient

matching algorithm (O(N) amortized time, where 'N' is the length of

the input). No backtracking or preprocessing required, other than the

time it takes to parse the RE.

--Joe English

jenglish@flightlab.com

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.