Henry Spencer's "Tcl" Regex Library

Chris F Clark <cfc@shell01.TheWorld.com>
Mon, 01 Oct 2007 15:15:42 -0400

          From comp.compilers

Related articles
Henry Spencer's "Tcl" Regex Library cfc@shell01.TheWorld.com (Chris F Clark) (2007-10-01)
Re: Henry Spencer's "Tcl" Regex Library rsc@swtch.com (Russ Cox) (2007-10-02)
Re: Henry Spencer's "Tcl" Regex Library cfc@TheWorld.com (Chris F Clark) (2007-10-02)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Mon, 01 Oct 2007 15:15:42 -0400
Organization: The World Public Access UNIX, Brookline, MA
Keywords: lex, question
Posted-Date: 02 Oct 2007 00:43:27 EDT

Michela Becchi, a friend and PhD student I am working with on regex
related matters and who is publishing a paper on Hybrid NFA-DFAs, got
an interesting review comment about the above mentioned library--that
it in fact implements hybrid NFA-DFAs. (I have copied Henry Spencer
via email, so that he can respond if not too busy. However, I'd also
like other insights if they are available.)


In any case, Michela's work has been in devising ways of splitting a
regex problem into sub-problems where some of the problem is solved by
a DFA and some of the problem is solved by an NFA. Her work results
in a very clever partitioning of the problem such that the resulting
machine runs at roughly DFA speeds while taking roughly NFA space. (I
believe the paper on that aspect has already been published, this is
follow-on work solving an interesting related problem.)


The question I'm curious to is whether the above library does the same
thing, i.e. partition a regex problem into a DFA portion and an NFA
portion and whether it does this statically or dynamically. Her work
does it statically, based upon the characteristics of the problem.


However, I believe there are regex libraries that incrementally
generate a DFA from an NFA dynamically as the program is running. (I
believe this is the essence of Thompson's NFA algorithm, which is
distinct from his subset construction algorithm.) That method would
have similar effects in general, but the approach and rationale is
different. For example, the dynamic version only generated the
portion of the DFA it uses. However, if it doesn't take advantage of
the static insights that Michela's approach does, it could potentially
generate an exponentially large machine. Moreover, the two approaches
are complimentary, one could generate a Hybrid machine with the
interesting characteristics that she outlined on-the-fly.


In any case, if there are hybrid NFA-DFA solutions out there (or in
progress), we would like to know what they are and how they work (and
what problems they solve).


Thanks for any input,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.