Re: Fast NFA engine anyone?

Chris F Clark <cfc@shell01.TheWorld.com>
23 Apr 2006 09:59:44 -0400

          From comp.compilers

Related articles
Fast NFA engine anyone? rd3ntat0@hotmail.com (Remo D.) (2006-04-21)
Re: Fast NFA engine anyone? rsc@swtch.com (Russ Cox) (2006-04-22)
Re: Fast NFA engine anyone? rd3ntat0@hotmail.com (Remo D.) (2006-04-22)
Re: Fast NFA engine anyone? cfc@shell01.TheWorld.com (Chris F Clark) (2006-04-23)
Re: Fast NFA engine anyone? reeuwijk@few.vu.nl (2006-04-23)
Re: Fast NFA engine anyone? Danny.Dube@ift.ulaval.ca (2006-05-11)
Re: Fast NFA engine anyone? rd3ntat0@hotmail.com (Remo D.) (2006-05-12)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 23 Apr 2006 09:59:44 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 06-04-121 06-04-129
Keywords: lex
Posted-Date: 23 Apr 2006 09:59:44 EDT

Remo D. wrote:
> I have the impression that going with a DFA representation is not a
> good idea. I feel that the time I will have to spend extracting the
> substring from the matching text will wipe out any benefit I will get (I
> might be wrong, of course).
...
> Well, consider that I'm interested in subpatterns and back references as
> well (sorry, I didn't mentioned that in my post).


Part of your problem, is that subpatterns and back references are
technically (i.e. at the theoretical level) outside of regular
expressions. Regular expressions are exactly what one can match with a
finite state machine and with no additional processing or state. You
need additional processing (or state) to track the starts and ends of
sub-expressions and back-references. There is no well-established
theory for handling "regexps" (aka PCREs--Perl Compatible Regular
Expressions), that is regular expressions extended with
sub-expressions, back-refs, (and non-greedy operators).


However, that being said, there is a start of such a theory and Lauri
Kari has at least one paper on it. Moreover, it explains the basic
ideas of how to leave markers in a DFA implementation of regexps.
It's roughly the same places you add markers to mix regular
expressions and LR parsing. Thus, you can do what you want, you just
won't find anyone who has done it before you. I'm in the process of
working out the rest of the details myself, so that I can have PCREs
in Yacc++'s lexer, but I'm doing that only when I have spare time,
which isn't as often as I like.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.