3 Nov 1997 21:02:11 -0500

Related articles |
---|

Some Finite State Machine and Parsing problems rboland@csi.uottawa.ca (Ralph Boland) (1997-11-02) |

Re: Some Finite State Machine and Parsing problems clark@quarry.zk3.dec.com (1997-11-03) |

From: | clark@quarry.zk3.dec.com (Chris Clark USG) |

Newsgroups: | comp.compilers,comp.theory |

Date: | 3 Nov 1997 21:02:11 -0500 |

Organization: | Digital Equipment Corporation - Marlboro, MA |

Distribution: | inet |

References: | 97-11-011 |

Keywords: | theory, DFA |

You asked some interesting questions in regard to finite state

transducers and regular expression transducers.

*> Q1.1 Are there useful applications where we would want to*

*> represent an FST by a RET?*

Probably. However, there is one important consideration. A RE

corresponds most closely to a NFA (non-deterministic finite state

automata). Similarly, a RET corresponds most closely to a

non-deterministic finite state transduced. Now, with finite state

automata there are simple transformations to convert a non-

deterministic automata into a deterministic one, and this is important

to your later questions. In particular, the non-determinism is what

gives rise to the lookahead needs.

That does not mean all is lost; an Earley parser can deal with the

non-determinism and handle your RET's.

*> Q1.4 Can we really do this [extend RRP(k) parsers to handle RETs and*

*> parse using them] and if so what are the applications?*

Yes and no. The extension won't work in general because of the

non-determinism. An RRP(k) parser depends on the fact that a NFA can

be converted to a DFA, because an RRP(k) parser can only advance to

one state at a time (and must do so deterministicly). Now, it is

possible to consider non-deterministic RRP(k) parsers, but that is not

what most people mean by RRP(k). [Note, here I am assuming you mean

ELR(k) or LL(k) for RRP(k) since those are the common RRP(k) variants

and your later description matches an early ELR(k) implementation

method.]

*> RRP(k) parsers work by processing input twice, once in each direction.*

Not necessarily. Yacc++ does only one pass over the input (one

forward pass, no backward passes). When we wrote it we didn't know

that you would "need" two passes for ELR(k). Of course, in the

interim we have discovered a paper by Nakata and Sassa which has

buried in it the proof that only one pass is necessary.

*> Q1.5 If we try the same trick with a RET can we build a machine*

*> with a finite number of states corresponding to the RET?*

*> Clearly we can process the input from left to right with*

*> a finite number of states. Can we then process the input*

*> a second time in the reverse direction so that we can*

*> output a character for each character of input read?*

Yes, I believe you are correct here. I believe you can process an RET

in two passes, provided that you have a way of resolving the

ambiguity. Although that may require ambiguity resolution by fiat

(i.e. if the RET is ambiguous, you guarantee that one of the legal

outputs will be selected, but not necessarily which one).

*> I call such expressions recursive regular expressions (RREs).*

This is exactly the same set as the ELR(k) [or RRP(k)] grammars.

Well, if you require non-determinism, the set would be E-Earley

(i.e. Early parsing of a regular right part grammars).

*> Q2.1 Can we translate such expressions into a grammar?*

Yes, there are translations which convert RE's into recursive grammar

form. Some early ELR(k) parsers used that technology. More recent

parsing methodologies allow the direct conversion of ELR(k) grammars

into parsers.

*> Q2.2 Are there parsing techknowlogies that can parse such*

*> grammars efficiently, preferably in linear time.*

Yes, the parsing complexity is exactly the same as LR(k).

*> Note that, though LR(k) parsing cannot handle a palidrome*

LR(k) parsing can handle palindromes (and your grammar is the LR(k)

grammar for the exact palindromes you describe), the stack exactly

assures that it can. There are languages that LR(k) cannot handle,

for example the copy language, but palindromes are fine.

*> I hope some of you find these problems interesting*

*> and can provide me with some answers or directions for*

*> future research.*

Your problems certainly show some interesting directions and are worth

your pursuing more. And don't let the fact that you have made some

errors stop you. Sometimes errors can lead to astounding discoveries.

For example, when we first wrote Yacc++, we hadn't read that direct

ELR parsing was considered a hard unsolved problem [at that time],

with complicated read-back mechanisms and other implementation

difficulties. We just thought that combining a couple of simple ideas

should work it out. Of course, some of those simple ideas were

actually errors in our understanding of standard LR parsing, although

since then some of the errors have become alternative LR parsing

methods (such as non-canonical LR parsing). Had we been frightened by

the difficulty of the task or not made the errors we made (which

turned out to simplify the problem conceptually), we would not have

found our solution to the problem. The same may hold for you.

For example, your Q1.5 is very similar to an idea that I have which

suggests that all non-ambiguous grammars (consisting of recursion and

regular expressions) can be parsed in linear time (not just the LR(k)

ones).

-Chris Clark

************************************************************************

Compiler Resources, Inc. email: compres@world.std.com

3 Proctor St. http://world.std.com/~compres

Hopkinton, MA 01748 phone: (508) 435-5016

USA 24hr fax: (508) 435-4847

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.