Parsing techniques

lex@prl.philips.nl (Lex Augusteijn)
Tue, 4 May 1993 06:35:11 GMT

          From comp.compilers

Related articles
Parsing techniques lex@prl.philips.nl (1993-05-04)
Re: Parsing techniques ipser@solomon.technet.sg (1993-05-07)
Re: Parsing techniques simonh@swidev.demon.co.uk (1993-05-08)
Parsing techniques kentr@rollinssoft.com (Kent Rollins) (1996-11-26)
Re: Parsing techniques scotts@metaware.com (Scott Stanchfield) (1996-12-01)
Re: Parsing techniques jon@mauney.com (1996-12-01)
Re: Parsing techniques miano@worldnet.att.net (1996-12-01)
[7 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: lex@prl.philips.nl (Lex Augusteijn)
Originator: lex@prl.philips.nl
Keywords: parse, bibliography
Organization: Philips Research Laboratories, Eindhoven, The Netherlands
Date: Tue, 4 May 1993 06:35:11 GMT

Recent discussions about error recovery in bottom-up and top-down parsers
and attribution of bottom-up parsers seem to indicate that the parsing
community is quite unaware of some recent developments in parsing theory
that bridge the cap between top-down and bottom-up parsers and between
deterministic parsers and non-deterministic tabulating ones. Therefore, I
will summarize some of the latest contributions.


Recursive Ascent Parsing
------------------------


Conventionally, top-down parsers, e.g. LL(1) parsers, are implemented by
means of a set of mutually recursive functions, i.e. by a recursive
descent parser. Bottom-up parsers, LR, LALR and the like, are
conventionally implemented by means of a stack automaton.


After more than twenty years in which this status quo has existed, another
implementation technique for bottom-up parsers was discovered
independently by Kruseman Aretz and Roberts, which use a set of mutually
recursive functions to implement LR parsers. Both coined these parsers
'recursive ascent parsers'. Although this technique was first regarded as
an alternative implementation for a stack automaton, it was later on
recognized that the stack automaton is merely a low level implementation
of recursion, where one makes the recursion stack explicit and numbers all
possible recursive calls to allow their encoding by means of a stack
automaton. Our paper in Theoretical Computer Science shows that LR
parsers can be very simply expressed by means of recursive functions,
without any use of automata theory. In retrospect, Penello had already
implemented a recursive ascent parser in assembly, without recognizing
this fact himself.


When people are interested, I can summarize the theory behind recursive
ascent parsers to the net.


A practical advantage of recursive ascent parsers is that the recursive
functions can be given additional parameters. E.g. follower sets can be
added to implement error recovery in quite the same way as conventionally
done for recursive descent parsers. This obsoletes low level techniques
that inspect and modify the parse stack in order to implement error
recovery. It also opens the door to add attributes as arguments to these
parse functions.


Our Elegant compiler generator generates recursive descent parsers for
LL(1) grammars and recursive ascent parsers for LALR(1) grammars, both
using essentially the same satisfying error recovery mechanism based on
adding follower set arguments to the parse functions.


Tabulating Parsers.
-------------------


Parsers for general context free languages, i.e. non-deterministic ones,
use tabulating techniques to keep track of already performed parses in
order to avoid duplicate work. Examples are the Earley and Tomita parsers.
Both are bottom-up parsers that either use a matrix (Earley) or a
complicated data structure called 'graph-structured stack' (Tomita) to
store the results of already performed parses.


When the often rediscovered concept of a 'memo-function'
[Michie,Hughes,Warren] is applied to different forms of recursive ascent
parsers, the Earley and Tomita parsers are obtained. A memo-function is a
function that stores the arguments-result values of previous invocations
in some kind of data structure, called a memo-table. When it is invoked
for the same set of arguments later on, it retrieves the corresponding
result from this memo-table instead of recomputing it. (NB: the whole
field of dynamic programming deals with the implementation of
          memo tables, often without recognizing this fact.)


[Exercise: Try to describe the Warshall shorest path algorithm as a memo
                      function.
                      Hint: it only takes two lines :-).


In a language without side-effects, memoization is semantically
transparent. A result of this is that by expressing a non-deterministic
recursive ascent (or descent) parser in a functional language, one only
has to assume that the parse functions are memoized in order to obtain an
Earley, Tomita or other kind of tabulating parser. The actual shape of the
memo table can be worked out as an implementation detail but is no longer
an essential part of the parsing algorithm.


I hope that this information will contribute to the ever re-appearing
discussions about adding error recovery or attributes to bottom-up
parsers.


Lex


---------------------------


@article{Earley,
author = {J. Earley},
title = {An efficient context-free parsing algorithm},
journal = cacm,
year = {1970},
volume = {13},
number = {2},
pages = {94--102}}


@incollection{Elegant,
author = {Lex Augusteijn},
title = {{The Elegant compiler generator system}},
editor = {P. Deransart and M. Jourdan},
booktitle = {{Attribute Grammars and their Applications}},
                series = {Lecture Notes in Computer Science 461},
publisher = {Springer Verlag},
address = {Berlin},
year = {1990},
pages = {238--254}}


@incollection{Hughes-memo,
author = {John Hughes},
title = {Lazy Memo-Functions},
editor = {J.-P. Jouannaud},
booktitle = {Functional Programming Languages and Computer
Architecture},
                series = lncs,
                number = {201},
publisher = {Springer Verlag},
address = {Berlin},
year = {1985},
pages = {129--146}}


@article{Kruseman-Ascent,
author = {F.E.J. {Kruseman Aretz}},
title = {{On a Recursive Ascent Parser}},
journal = ipl,
year = {1988},
volume = {29},
pages = {201--206}}


@article{Leermakers-Augusteijn-Kruseman,
author = {Ren\'e Leermakers and Lex Augusteijn and Frans E.J.
{Kruseman Aretz}},
title = {{A functional LR-parser}},
journal = tcs,
year = {1992},
volume = {104},
pages = {313--323}}


@article{Michie-memo,
author = {D. Michie},
title = {{"Memo" Functions and Machine Learning}},
journal = {Nature},
month = {April},
year = {1968},
volume = {218},
pages = {19--22}}


@article{Penello,
author = {Thomas J. Penello},
title = {{Very Fast LR Parsing}},
journal = sigplan,
year = {1986},
volume = {21},
number = {7},
pages = {145-151}}


@article{Roberts-RA,
author = {George H. Roberts},
title = {{Recursive Ascent: An LR Analog to Recursive Descent}},
journal = sigplan,
year = {1988},
volume = {23},
number = {3},
pages = {23--29}}


@book{Tomita,
    author = {Masaru Tomita},
    title = {Efficient Parsing for Natural Language},
    publisher = {Kluwer Academic Publishers},
    address = {Boston},
    year = {1986}}}


@article{Warren-Memo,
author = {David S. Warren},
title = {{Memoing for Logic Programs}},
journal = cacm,
year = {1992},
volume = {35},
number = {1},
pages = {93--111}}


--
Lex Augusteijn
Philips Research Labs
Eindhoven
The Netherlands
lex@prl.philips.nl
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.