Re: How to rewrite a regexp without word boundaries?

hu47121@usenet.kitty.sub.org (Hannah Schroeter)
Sun, 16 Aug 2009 00:37:24 +0000 (UTC)

          From comp.compilers

Related articles
How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-05)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-05)
Re: How to rewrite a regexp without word boundaries? dave_140390@hotmail.com (2009-07-06)
Re: How to rewrite a regexp without word boundaries? haberg_20080406@math.su.se (Hans Aberg) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? h.b.furuseth@usit.uio.no (Hallvard B Furuseth) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? andrew@tomazos.com (Andrew Tomazos) (2009-07-07)
Re: How to rewrite a regexp without word boundaries? cfc@shell01.TheWorld.com (Chris F Clark) (2009-07-13)
Re: How to rewrite a regexp without word boundaries? hu47121@usenet.kitty.sub.org (2009-08-16)
Re: How to rewrite a regexp without word boundaries? dot@dotat.at (Tony Finch) (2009-08-16)
| List of all articles for this month |

From: hu47121@usenet.kitty.sub.org (Hannah Schroeter)
Newsgroups: comp.compilers,comp.theory
Date: Sun, 16 Aug 2009 00:37:24 +0000 (UTC)
Organization: The Pond
References: 09-07-003 09-07-004 09-07-008 09-07-011
Keywords: lex
Posted-Date: 15 Aug 2009 22:48:47 EDT

Hello!


Andrew Tomazos <andrew@tomazos.com> wrote:
>[...]


>Also one thing to note is that in formal language theory "regular
>expression" has a well-defined meaning. See Chomsky. Perl's regular
>expressions do *not* classify as regular expressions under the formal
>definition.
> -Andrew.
>[Perl's regex engine is swell, but if performance is an issue it's nowhere
>near as fast as a DFA generated by flex or re2c. -John]


IIRC perl's regex implementation (as well as glibc, btw) even lose
compared to *NFA emulation*, for example with a?^na^n, matched against
a^n (where "^n" is manually expanded to mean n times, i.e. with n=3
it'd be the regular expression a?a?a?aaa against string aaa). Java
loses, too (been there, tested it). OpenBSD's libc doesn't (seems to
be plain vanilla NFA emulation when it sees the expression is in fact
regular; yields the expected, roughly quadratic runtime behaviour).


See http://swtch.com/~rsc/regexp/regexp1.html for information about that
problem.


I also tested the same thing with flex, using a small script that
generates flex input and a test program for a given n. Horrendous
turnaround times (flex generation mostly) for increasing n, blazingly
fast match times, both as expected.


Kind regards,


Hannah.
[Spamassassin has an option to compile its patterns with re2c and link that into
perl. It takes the better part of an hour to compile, but it runs really fast. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.