Mon, 24 May 1993 15:17:27 GMT

Related articles |
---|

The Tomita Parsing Algorithm (LR(k) with Dynamic Programming) markh@csd4.csd.uwm.edu (1993-05-23) |

Re: The Tomita Parsing Algorithm (LR(k) with Dynamic Programming) Bernard.Lang@inria.fr (1993-05-24) |

Newsgroups: | comp.theory,comp.compilers,comp.ai.nat-lang |

From: | Bernard.Lang@inria.fr |

Followup-To: | comp.theory |

Summary: | Dynamic programming parsing of CF languages and extensions |

Keywords: | parse, theory |

Organization: | INRIA - Rocquencourt |

References: | 93-05-108 |

Date: | Mon, 24 May 1993 15:17:27 GMT |

The fact that "Tomita's" algorithm is a dynamic programming algorithm has

been known for about 15 years (and probably before then, though less

formally). It is essentially a variant of Earley's parsing algorithm

(which had none of Tomita's limitation) and which has been formally shown

to be very similar to the CYK tabular algorithm (see standard textbooks

for all of these) by Graham-Harrison and Ruzzo around 1978, I believe in

ACM-TOPLAS. The latter has been know as a dynamic programming technique

probably since it was published in 1965.

Almost all these algorithms, and later ones (not including Tomita's),

run at worst in time n^3, where n is the length of the input string, and

run often linearly on a large domain (CKY is always n^3).

I published a general dynamic programming (though not called that way)

interpretation of non-deterministic pushdown automata in 1974

(Deterministic Techniques for efficient non-deterministic Parsers, Proc.

of 2nd Coll. on Automata, Languages and Programming, Springer LNCS 14, pp

255-269). This applies in particular to non-deterministic PDAs produced

by the LR(k) construction, and yields an n^3 algorithm, very similar in

essence to the algorithm of Tomita, but without its limitations (works on

all CF grammars, in time at worst n^3, and often faster: for example it

reduces to linear LR(k) parsing when the grammar is LR(k) and the PDA

connstruction technique is LR(k) - but now this seems obvious).

This algorithm was developed formally, but not implemented then: the

interest was limited by the power of the computers of the time. However a

direct construction based on precedence parsing was experimented in

1971-72 (SIGPLAN Notices 6-12, Dec. 1972).

The 1974 algorithm did not produce a shared forest or all possible

parse-trees, but a CF grammar of all leftmost parses (a leftmost parse

being a string of rules used for reduction). It turns out that this CF

grammar is isomophic to a parse forest, which leads to more recent results

indicating that parse forests are nothing but CF grammars (more precisely

a specialization to the parsed input of the original CF grammar --

specialization in the sense of partial evaluation).

Another dynamic programming construction was proposed (Information and

Control 13, pp 186-206) by Aho-Hopcroft and Ullman in 1968, but like the

CKY construction it run in n^3.

These techniques and results extend to a variety of non-CF languages,

and also to the evaluation of logic programs (Horn Clauses), as indicated

by the early results of Pereira and Warren on Earley deduction (ACL'83). A

special case of this are the so called "magic set" techniques used in

deductive databases.

Note that breadth-first search is not stricly necessary to handle

non-determinism (cf Sheil, ACL 1976) and all one really needs is a fair

computation strategy. This is particularly important for extensions to

logic programming.

Other constructions have been published which do not have Tomita's

limitations and do have adequate complexity. In the case of LR automata,

and more generally of bottom-up PDAs, the constructions can be simplified.

Several people have independently come up with variants based on the use

of memo functions, but that is essentially a form of dynamic programming.

The algorithm of Tomita, and other dynamic programming algorithms are in

particular discussed in the proceedings of the first International

Workshop on Parsing Technology (1989), 2 volumes with a different name,

both edited by Tomita, and published by Kluwer. They contain some papers

that discuss the limitations of the algorithm, and possible remedies.

There are many other publications on the topic.

The ability to deal with cyclic grammar is probably useless in most

applications. It can however be useful when dealing with some problems of

error recovery on ill-formed input.

Part of my own published contribution to the topic is available (in

compressed format) by ftp at

ftp ftp.inria.fr

in directory: INRIA/publication/ChLoE

read the file 00-Index to get detailed contents of the directory.

I do not have an up-to-date bibliography handy, but many references may

be found in these papers.

Bernard Lang

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.