Tue, 4 May 1993 06:35:11 GMT

Related articles |
---|

Parsing techniques lex@prl.philips.nl (1993-05-04) |

Re: Parsing techniques ipser@solomon.technet.sg (1993-05-07) |

Re: Parsing techniques simonh@swidev.demon.co.uk (1993-05-08) |

Parsing techniques kentr@rollinssoft.com (Kent Rollins) (1996-11-26) |

Re: Parsing techniques scotts@metaware.com (Scott Stanchfield) (1996-12-01) |

Re: Parsing techniques jon@mauney.com (1996-12-01) |

Re: Parsing techniques miano@worldnet.att.net (1996-12-01) |

[7 later articles] |

Newsgroups: | comp.compilers |

From: | lex@prl.philips.nl (Lex Augusteijn) |

Originator: | lex@prl.philips.nl |

Keywords: | parse, bibliography |

Organization: | Philips Research Laboratories, Eindhoven, The Netherlands |

Date: | Tue, 4 May 1993 06:35:11 GMT |

Recent discussions about error recovery in bottom-up and top-down parsers

and attribution of bottom-up parsers seem to indicate that the parsing

community is quite unaware of some recent developments in parsing theory

that bridge the cap between top-down and bottom-up parsers and between

deterministic parsers and non-deterministic tabulating ones. Therefore, I

will summarize some of the latest contributions.

Recursive Ascent Parsing

------------------------

Conventionally, top-down parsers, e.g. LL(1) parsers, are implemented by

means of a set of mutually recursive functions, i.e. by a recursive

descent parser. Bottom-up parsers, LR, LALR and the like, are

conventionally implemented by means of a stack automaton.

After more than twenty years in which this status quo has existed, another

implementation technique for bottom-up parsers was discovered

independently by Kruseman Aretz and Roberts, which use a set of mutually

recursive functions to implement LR parsers. Both coined these parsers

'recursive ascent parsers'. Although this technique was first regarded as

an alternative implementation for a stack automaton, it was later on

recognized that the stack automaton is merely a low level implementation

of recursion, where one makes the recursion stack explicit and numbers all

possible recursive calls to allow their encoding by means of a stack

automaton. Our paper in Theoretical Computer Science shows that LR

parsers can be very simply expressed by means of recursive functions,

without any use of automata theory. In retrospect, Penello had already

implemented a recursive ascent parser in assembly, without recognizing

this fact himself.

When people are interested, I can summarize the theory behind recursive

ascent parsers to the net.

A practical advantage of recursive ascent parsers is that the recursive

functions can be given additional parameters. E.g. follower sets can be

added to implement error recovery in quite the same way as conventionally

done for recursive descent parsers. This obsoletes low level techniques

that inspect and modify the parse stack in order to implement error

recovery. It also opens the door to add attributes as arguments to these

parse functions.

Our Elegant compiler generator generates recursive descent parsers for

LL(1) grammars and recursive ascent parsers for LALR(1) grammars, both

using essentially the same satisfying error recovery mechanism based on

adding follower set arguments to the parse functions.

Tabulating Parsers.

-------------------

Parsers for general context free languages, i.e. non-deterministic ones,

use tabulating techniques to keep track of already performed parses in

order to avoid duplicate work. Examples are the Earley and Tomita parsers.

Both are bottom-up parsers that either use a matrix (Earley) or a

complicated data structure called 'graph-structured stack' (Tomita) to

store the results of already performed parses.

When the often rediscovered concept of a 'memo-function'

[Michie,Hughes,Warren] is applied to different forms of recursive ascent

parsers, the Earley and Tomita parsers are obtained. A memo-function is a

function that stores the arguments-result values of previous invocations

in some kind of data structure, called a memo-table. When it is invoked

for the same set of arguments later on, it retrieves the corresponding

result from this memo-table instead of recomputing it. (NB: the whole

field of dynamic programming deals with the implementation of

memo tables, often without recognizing this fact.)

[Exercise: Try to describe the Warshall shorest path algorithm as a memo

function.

Hint: it only takes two lines :-).

In a language without side-effects, memoization is semantically

transparent. A result of this is that by expressing a non-deterministic

recursive ascent (or descent) parser in a functional language, one only

has to assume that the parse functions are memoized in order to obtain an

Earley, Tomita or other kind of tabulating parser. The actual shape of the

memo table can be worked out as an implementation detail but is no longer

an essential part of the parsing algorithm.

I hope that this information will contribute to the ever re-appearing

discussions about adding error recovery or attributes to bottom-up

parsers.

Lex

---------------------------

@article{Earley,

author = {J. Earley},

title = {An efficient context-free parsing algorithm},

journal = cacm,

year = {1970},

volume = {13},

number = {2},

pages = {94--102}}

@incollection{Elegant,

author = {Lex Augusteijn},

title = {{The Elegant compiler generator system}},

editor = {P. Deransart and M. Jourdan},

booktitle = {{Attribute Grammars and their Applications}},

series = {Lecture Notes in Computer Science 461},

publisher = {Springer Verlag},

address = {Berlin},

year = {1990},

pages = {238--254}}

@incollection{Hughes-memo,

author = {John Hughes},

title = {Lazy Memo-Functions},

editor = {J.-P. Jouannaud},

booktitle = {Functional Programming Languages and Computer

Architecture},

series = lncs,

number = {201},

publisher = {Springer Verlag},

address = {Berlin},

year = {1985},

pages = {129--146}}

@article{Kruseman-Ascent,

author = {F.E.J. {Kruseman Aretz}},

title = {{On a Recursive Ascent Parser}},

journal = ipl,

year = {1988},

volume = {29},

pages = {201--206}}

@article{Leermakers-Augusteijn-Kruseman,

author = {Ren\'e Leermakers and Lex Augusteijn and Frans E.J.

{Kruseman Aretz}},

title = {{A functional LR-parser}},

journal = tcs,

year = {1992},

volume = {104},

pages = {313--323}}

@article{Michie-memo,

author = {D. Michie},

title = {{"Memo" Functions and Machine Learning}},

journal = {Nature},

month = {April},

year = {1968},

volume = {218},

pages = {19--22}}

@article{Penello,

author = {Thomas J. Penello},

title = {{Very Fast LR Parsing}},

journal = sigplan,

year = {1986},

volume = {21},

number = {7},

pages = {145-151}}

@article{Roberts-RA,

author = {George H. Roberts},

title = {{Recursive Ascent: An LR Analog to Recursive Descent}},

journal = sigplan,

year = {1988},

volume = {23},

number = {3},

pages = {23--29}}

@book{Tomita,

author = {Masaru Tomita},

title = {Efficient Parsing for Natural Language},

publisher = {Kluwer Academic Publishers},

address = {Boston},

year = {1986}}}

@article{Warren-Memo,

author = {David S. Warren},

title = {{Memoing for Logic Programs}},

journal = cacm,

year = {1992},

volume = {35},

number = {1},

pages = {93--111}}

--

Lex Augusteijn

Philips Research Labs

Eindhoven

The Netherlands

lex@prl.philips.nl

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.