Regular Expression "Terms"

Chris F Clark <cfc@shell01.TheWorld.com>
Thu, 06 May 2010 17:14:10 -0400

          From comp.compilers

Related articles
Regular Expression "Terms" cfc@shell01.TheWorld.com (Chris F Clark) (2010-05-06)
Re: Regular Expression "Terms" lee.benfield@gmail.com (lab27) (2010-05-09)
Re: Regular Expression "Terms" rpboland@gmail.com (Ralph Boland) (2010-05-10)
Re: Regular Expression "Terms" gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-05-11)
Re: Regular Expression "Terms" cfc@shell01.TheWorld.com (Chris F Clark) (2010-05-11)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Thu, 06 May 2010 17:14:10 -0400
Organization: The World Public Access UNIX, Brookline, MA
Keywords: lex, question
Posted-Date: 09 May 2010 12:18:26 EDT

I've been investigating high-performance regular expression matching
and have come to need some words (terms) to describe particular types
of simple regular expressions. Obviously, if there are words already
in use, I would like to use the same ones. Here are the words we have
been tenatively been using. Please let me know if there are alternate
words you have seen to describe the same concepts.


A regular expression that consists only of "characters" (symbols from
the alphabet) and "concatenation" only describe "fixed strings" or
"literals".


If we extend that to include "or" operators, but no recursion or
closures ("plus" or "star" operators). You can describe bounded
length expressions which we call "terms". Note one can include
"question mark" operators and "character sets" in this class, without
increasing its expressive power. It's this concept, the "term"
concept I am most interested in feedback on. I believe we got this
word from looking at a regular expression grammar.


Finally, we use the term "separator" to describe regular expressions
that consist of unbounded regular expressions (or expressions with
"large" finite bounds). Typically, there are of the form ".*" aka
dot-star expressions. It is generally the presence of one-or-more of
these in a regular expression that causes the NFA to DFA conversion to
experience exponential state explosion. That's not the sole criteria
for explosion, but it is a simple one to describe.


Thanks in advance for any help,
-Chris


******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.