# Re: Regular Expressions

## Colm O'Dunlaing <odunlain@maths.tcd.ie>Tue, 31 Oct 1995 16:19:10 GMT

From comp.compilers

Related articles
[7 earlier articles]
Re: Regular Expressions Martin.Ward@durham.ac.uk (Martin Ward) (2004-10-17)
Re: Regular Expressions choksheak@yahoo.com (ChokSheak Lau) (2004-10-21)
Re: regular expressions wendt@CS.ColoState.EDU (1993-03-22)
Regular Expressions rafae1@hp.fciencias.unam.mx (trejo ortiz alejandro augusto) (1995-10-16)
Re: Regular Expressions mnp@compass-da.com (Mitchell Perilstein) (1995-10-23)
Re: Regular Expressions cgh@cs.rice.edu (1995-10-29)
Re: Regular Expressions odunlain@maths.tcd.ie (Colm O'Dunlaing) (1995-10-31)
Re: Regular Expressions natasha@softlab.ece.ntua.gr (1995-11-03)
Re: Regular Expressions sjmccaug@prairienet.org (1995-11-28)
Re: Regular Expressions jmccarty@spdmail.spd.dsccc.com (1995-11-29)
| List of all articles for this month |

 Newsgroups: comp.compilers From: Colm O'Dunlaing Keywords: DFA, theory Organization: Dept. of Maths, Trinity College, Dublin, Ireland. References: 95-10-087 Date: Tue, 31 Oct 1995 16:19:10 GMT

1) Is there a canonical form for regular expressions(over a finite alphabet)?
Yes, sort of. Convert a given regular expression R to a
nondeterministic finite automaton N, then convert N to a
determininistic finite automaton D, then apply minimisation to
D to construct a minimal d.f.a. M. The automaton M is unique
for the regular set it accepts, so M can be taken as a canonical
form for the regular expression (I wouldn't bother about converting
M back to a regular expression).

Conversion from N to D may increase the number of states
exponentially. So the canonical form could be very
large. It is almost certain that any other choice of
canonical form could also be very large, since the
problem: Are R and S equivalent? (where R and S are
regular expressions) is PSPACE complete.

See
'Introduction to automata theory, languages, and computation,'
by Hopcroft and Ullman.

3) An ambiguous grammar for regular expressions over the alphabet {a, b}is the
following:
R::=RR | R + R | R* | (R) |a|b
The question is: How can I state an unambiguous grammar for regular
expressions?

(You probably need two more symbols \emptyset and \lambda
for the empty set and the empty string. However...)

I think, but am not absolutely certain, that the following
grammar will work. Implicitly, it prioritises operators
in the order * > concatenation > +, and orders evaluation
from right to left (I think).

A::= (R)|a|b
B::=A*|B*
C::=AR|BR
D::=A+R|B+R|C+R
R::=A|B|C|D

Yours, Colm O Dunlaing (Trinity College, Dublin, Ireland)

--

Post a followup to this message