# Re: How to compute a regex which is the difference between two regexes?

## Chris F Clark <cfc@shell01.TheWorld.com>Mon, 11 Jan 2010 20:10:25 -0500

From comp.compilers

Related articles
[2 earlier articles]
Re: How to compute a regex which is the difference between two regexes egagnon@j-meg.com (Etienne M. Gagnon) (2010-01-03)
Re: How to compute a regex which is the difference between two regexes gene.ressler@gmail.com (Gene) (2010-01-03)
Re: How to compute a regex which is the difference between two regexes mciura@gmail.com (Marcin Ciura) (2010-01-04)
Re: How to compute a regex which is the difference between two regexes max@gustavus.edu (Max Hailperin) (2010-01-04)
Re: How to compute a regex which is the difference between two regexes kkylheku@gmail.com (Kaz Kylheku) (2010-01-05)
Re: How to compute a regex which is the difference between two regexes kkylheku@gmail.com (Kaz Kylheku) (2010-01-06)
Re: How to compute a regex which is the difference between two regexes cfc@shell01.TheWorld.com (Chris F Clark) (2010-01-11)
| List of all articles for this month |

 From: Chris F Clark Newsgroups: comp.compilers Date: Mon, 11 Jan 2010 20:10:25 -0500 Organization: The World Public Access UNIX, Brookline, MA References: 10-01-013 10-01-033 Keywords: DFA, lex Posted-Date: 13 Jan 2010 22:39:48 EST

Kaz Kylheku <kkylheku@gmail.com> writes:

....
> Using a four letter {a, b, c, d} alphabet, I computed the following
> example complement:
>
> complement(a(abc)*d) -> (|[bcd]|a(abc)*[bc]|a(abc)*d[abcd]+|
> aab[abd][abcd]*|aa[acd][abcd]*)
>
> On informal inspection this seems right (do you see any mistakes).
> I.e. an empty string mismatches, since the regex derives only nonempty
> strings, so we include it. The the one-letter strings b, c and d do
> not match, because the regex has a null derivative with respect to
> these. An a followed by by zero or more repetitions of "abc",
> followed by something other than an a or d (i.e. [bc]) does not match
> and so is a branch of of the complement. And so forth.

Your algorithm which works directly on "regular expressions" is
actually minimicing the state traversal that one does in the finite
automaton versions of the algorithm. Not, that that is a bad thing.
In fact, it is one way of establishing the correctness of the
algorithm, by showing how it does exactly the same thing just using a
different notation.

In fact, it is possible to represent the states of a finite automaton,
as a series of sets of "dotted items" (each state being one set of
dotted items). Each dotted item, having a dot (cursor) in the regular
expression showing where in the regular expression that state
represents.

I believe you will find a correspondence between these sets of dotted
items and derivatives.

So, in that sense, there is an affirmative answer to the original
question. There is an algorithm that works directly with regular
expressions that generates the difference between two regular
expressions. However, the mechanics of the algorithm recapitulate
transforming the regular expressions into finite automatons and
working with the states to find taversals to all the states that are
final in the first automaton (subtrahend?) but not final in the second
(minuend?).

It is also possible to do the same thing with algebraic manipulations,
if you allow yourself to write an expression the with regular
expressions augmented by the set theory operators (union,
intersection, complement) and then algebraicly transform it into an
expression without those operators. My intuition tells me that the
algebra will effectively perform the same operations as the traversal
algorithm. However, I leave such as an exercise to the motivated

Hope this helps,
-Chris

******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

Post a followup to this message