27 Mar 1996 00:07:21 -0500

Related articles |
---|

Non-deter. regular expr. -> deter. regular expr. faase@cs.utwente.nl (1996-03-20) |

Re: Non-deter. regular expr. -> deter. regular expr. leichter@smarts.com (Jerry Leichter) (1996-03-22) |

Re: Non-deter. regular expr. -> deter. regular exp mark@omnifest.uwm.edu (1996-03-27) |

Re: Non-deter. regular expr. -> deter. regular exp mark@omnifest.uwm.edu (1996-03-27) |

From: | mark@omnifest.uwm.edu (Mark Hopkins) |

Newsgroups: | comp.compilers |

Date: | 27 Mar 1996 00:07:21 -0500 |

Organization: | Omnifest |

References: | 96-03-125 96-03-152 |

Keywords: | DFA, theory |

As far as I know, there is no such thing as a "non-deterministic"

regular expression, since every r.e. can be recognized by a

deterministic finite state automaton. So I have to interpret your

question to mean: Is there a purely algebraic way to extract a DFA

from a regular expression? There is.

A regular expression can be represented as a finite sum of terms,

each of which is a finite product of factors. A factor is either a

terminal or a starred regular expression. The type of algebra that

applies to regular expressions is called a semi-ring. Its axioms are:

a(bc) = (ab)c a+(b+c) = (a+b)+c

a1 = a = 1a a+0 = a = 0+a

a0d = 0 a+b = b+a

a(b+c)d = abd+acd

In addition, you have the property: a + a = a, which makes the algebra

an idempotent semi-ring. An expression in an idempotent semi-ring can

be represented as a finite set of terms, with each term a finite

sequence of factors and "factor" defined as above. Here, I'm using

the notation:

a + b <--> a union b

1 <--> empty string

0 <--> empty set

Therefore, by using the axioms in the following forms:

Set Manipulations:

a --> 0 + a

(a + b) + c --> (a + c) + b to sort

a + a --> a to remove duplicates.

a + 0 --> a to remove trailing zeros.

a + (b + c) --> (a + b) + c to associate to the left.

Sequence Manipulations:

a --> 1a

a1 --> a to remove trailing ones.

(ab)c --> a(bc) to associate to the right.

Distributivity:

a0 --> 0

a(b + c) --> ab + ac

you arrive at a normal form for expressions that can be mapped one-to-one to

the "finite set of finite sequence" form described above.

Example: a* ((b + c) d) + e (1 (f + g))

--> ((((0 + ((1 a*) b) d) + ((1 a*) c) d) + (1 e) f) + (1 e) g

= a* b d + a* c d + e f + e g

<-->

{ (a*, b, d), (a*, c, d), (e, f), (e, g) }

by repeated applications of the steps above, where I'm assuming that the

four terms are already sorted.

Afterwards you can strip off any leading 0's and 1's with the reductions:

0 + a --> a

1a --> a

You can convert each expression into a system of equations which it will be

the minimal solution to. Each equation in the system will have the form:

Taylor's Theorem:

E = E_0 + x dE/dx + y dE/dy + ... + z dE/dz

where x, y, ..., z are the terminals, and E_0 is either 0 or 1. This is

called Right-Linear form, since all the terminals (or "variables") are on

the right-hand side. The right-hand side will be "deterministic" in the

sense that no more than one term will contain a given terminal after it is

reduced to normal form.

E_0 is obtained by substituting 0 in for all the terminals in E.

d/dx is a "partial differential operator" and is defined by the following:

d1/dx = d0/dx = 0

dx/dx = 1, dy/dx = 0 if y is a terminal other than x.

d(A*)/dx = dA/dx A*

d(AB)/dx = dA/dx B + A_0 dB/dx

d(A+B)/dx = dA/dx + dB/dx

Note the variation in the product rule. You can easily extend this to the

other operations [A] = 1 + A and A+ = A A*, by the following:

d[A]/dx = d(1 + A)/dx = d1/dx + dA/dx = 0 + dA/dx = dA/dx

d(A+)/dx = dA/dx A* + A_0 dA/dx A*

= (1 + A_0) dA/dx A*

= dA/dx A*

since 1 + A_0 is either 1 + 0 = 1, or 1 + 1 = 1.

Define the total differential dE as the sum of all the terms x dE/dx. Then

you can write:

E = E_0 + dE

with

d0 = d1 = 0

dx = x, for all terminals x

d(A*) = d(A+) = dA A*

d([A]) = dA

d(AB) = dA B + A_0 dB

d(A+B) = dA + dB

Also, you can define E_0 by the following table:

0_0 = x_0 = 0

1_0 = [A]_0 = (A*)_0 = 1

(A+)_0 = A_0

(A+B)_0 = A_0 + B_0

(AB)_0 = A_0 B_0

which is equivalent to making the above-mentioned substitutions.

So apply Taylor's Theorem and reduce the result to normal form.

Example: a* (b a*)*

(a* (b a*)*)_0 = 0* (0 0*)* = 1 (0)* = 1

d(a* (b a*)*) = a (a* (b a*)*) + (b a*) (b a*)*

--> a (a* (b a*)*) + b (a* (b a*)*)

Which establishes a* (b a*)* as a solution to a system consisting of only

one equation, of the form:

x = 1 + a x + b x

In general, there will be more than one expression and more than one equation.

When arriving at the result:

E = E_0 + x dE/dx + y dE/dy + ... + z dE/dz

each of the derivatives dE/dx, dE/dy and dE/dz will have to be likewise

reduced if they already haven't. This is where the extra equations will

come from.

The automaton is read off this as follows:

(1) Each expression is a state

(2) State E is final if E_0 = 1.

(3) State E has a transition to state F on x if dE/dx = F.

---------------

To actually prove that each expression is a solution to its corresponding

system of equations (and that it's also the least solution), you need to

extend the algebra by adding in the following axioms:

1 + a a* = a*

if a + bx + xc <= x then b* a c* <= x

where (u <= v) is defined by (u + v = v).

From this you can ultimately show that

0* = 1* = 1

a* = 1 + a a* = 1 + a* a

and other similar properties, as well as showing that the following equations

have the following least solutions:

x = a + bx x = b* a

x = a + xc x = a c*

x = a + bx + xc x = b* a c*

x = a + xdx x = a (d a)* = (a d)* a

x = a + bx + xc + xdx x = b* a c* (d b* a c*)*

For the example above, this implies that:

x = 1 + ax + bx = 1 + (a + b)x

has (a + b)* as a least solution, whose equality to a* (b a*)* is also

proveable from the two axioms above.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.