Recursive descent parsing is better than LL(1)

jan@klikspaan.si.hhs.nl
Fri, 22 Jan 1993 16:41:50 GMT

          From comp.compilers

Related articles
Top-Down Parser Construction Conjectures bart@cs.uoregon.edu (1993-01-18)
Re: Top-Down Parser Construction Conjectures W.Purvis@daresbury.ac.uk (1993-01-18)
Recursive descent parsing is better than LL(1) jan@klikspaan.si.hhs.nl (1993-01-22)
Re: Recursive descent parsing is better than LL(1) tchannon@black.demon.co.uk (1993-01-23)
Re: Recursive descent parsing is better than LL(1) pjj@cs.man.ac.uk (Pete Jinks) (1993-01-27)
| List of all articles for this month |

Newsgroups: comp.compilers
From: jan@klikspaan.si.hhs.nl
Organization: Compilers Central
Date: Fri, 22 Jan 1993 16:41:50 GMT
Keywords: parse, LL(1), theory
References: 93-01-122 93-01-123

Barton Christopher Massey < bart@cs.uoregon.edu > writes:


> I will personally choose to use YACC or something similar for parsing until
> I can implement an LL(1) parser generator which will take a reasonably
> ad-hoc grammar and either transform it to an LL(1) grammar or report
> failure to do so, "most of the time".


> In order to construct such a tool, I would like to prove or disprove the
> following conjectures:


> LL(1) Transformability: There exists an algorithm
> which, given any grammar G which is unambiguous and
> describes an LL(1) language L(G), produces an LL(1)
> grammar G' for the language L(G).


> LL(1) Decidability: There exists an algorithm which,
> given any grammar G which is unambiguous and describes
> a language L(G), decides whether L(G) is an LL(1)
> language.


> After some thought, it still seems most likely to me that LL1T is true,
> but I'm skeptical about LL1D . I posted these conjectures to
> comp.compilers previously, but didn't get too far with them then. In a
> way, I'm surprised if such apparently important questions haven't been
> carefully studied, but I haven't found any very useful references yet.


Bill Purvis <W.Purvis@daresbury.ac.uk> writes:


> In Computer Journal, Vol 11, number 1, May 1968, J. Foster describes SID,
> a syntax improving program which accepts a fairly arbitrary BNF
> description and applies the neccesary transformations to produce an LL(1)
> grammar (assuming that the language is LL(1). I think that that fully
> covers the first conjecture.


> The program is pretty comprehensive and while it might not cover all
> cases, I suspect that in practical terms it probably covers the second
> conjecture.


This discussion is a bit blurred by the use of the clauses `given any
grammar G' and `reasonably ad-hoc grammar'. I agree with Bill Purvis that
SID does a good job on some fuzzy set of `reasonably good ad-hoc
grammars'.


The general case `given any unambigous grammar G' cannot be solved by SID
if it only uses the transformations substitution, elimination of left
recursion and factorization.


I present a simple contra-example:


Consider the language defined by the next CFG:


S -> A
S -> B
A -> `x' A `y'
A -> `a'
B -> `x' B `z'
B -> `b'


SID uses elimination of left recursion, factorisation and substitution to
search for a better grammar.


The problem here is that A and B have `x' in common in their FIRST-sets.
So let's substitute A and B in the S-rules giving:


S -> `x' A `y'
S -> `a'
S -> `x' B `z'
S -> `b'
A -> `x' A `y'
A -> `a'
B -> `x' B `z'
B -> `b'


Two S-rules now start both with `x' so we have to factorize them giving:


S -> `x' S1
S -> `a'
S -> `b'
A -> `x' A `y'
A -> `a'
B -> `x' B `z'
B -> `b'
S1 -> A `y'
S1 -> B `z'


Now after solving the problem with the S-rules, we get the same problem
with the S1-rules.


We can repeat both substitution and factiorisation on S1 giving:


S -> `x' S1
S -> `a'
S -> `b'
A -> `x' A `y'
A -> `a'
B -> `x' B `z'
B -> `b'
S1 -> `x' S2
S1 -> `a'
S1 -> `b'
S2 -> A `y' `y'
S2 -> B `z' `z'


Giving rise to the same kind of conflict, but now on S2-rules. Obviously
the proces of substitution and factorization can be repeated
indefinitedly.


The loop suggests the conjecture of decidability doesn't hold.


Foster expresses this problem with `SID may also report failure because
its attempt to transform the grammar causes it to loop.'


But let's return to the initial discussion: recursive descent versus YACC.


Maybe we underestimate the power of recursive descent technique if only we
could get rid of the idea each nonterminal should be mapped to one
function recognizing the right sides of its productions.


Let's try a different approach: Each function recognizes right sides of
one or more nonterminals, returning as a result the kind(s) of
nonterminals recognized.


Let's return to the example grammar above:


S -> A
S -> B
A -> `x' A `y'
A -> `a'
B -> `x' B `z'
B -> `b'


As we cannot discern A from B let's integrate them into one function AorB.
This should answer the question `What did you see' with : neither (call
this an error), A, B, or Both.


AorB might be modelled with a statediagram like:


                                                                              +-----------+
                                                      +------A-->-+ `y' +---y---> result is A
                                                      | +-----------+
          +--------+ +---+--+ =-----------+
>----+ `x' +---x-->-+ AorB +---B----+ `z' +---z---> result is B
          +--------+ +-+-+--+ +-----------+
                                                  | | +-----------+---x---> result is A
                                                  V +----Both---+ `y' or z' +
                                            neither +-----------+---y---> result is B
                                                  |
      conflicting inputs ---+---------------------------------> result is neither


As AorB has no Both exit, this branch could be pruned from the diagram.


In C this would look like:


#define neither 0
#define Both 1
#define A 2
#define B 3


char lookahead;


int match(char c)
{
if c == lookahead)
{
lookahead = getchar();
return 1;
}
else
return 0;
}


int AorB()
{
int r;


if (match('x'))
{
switch (AorB())
{
case neither:
return neither;
case A:
if (match('y'))
return A;
else
return neither;
case B:
if (match('z'))
return B;
else
return neither;
case Both: /* omitted */
}
}
else
return neither;
}


Note: This approach is top-down and recursive descent, but not predictive
parsing! I suppose others have figured this out before. I have no label
to stick upon it. Maybe someone else on the net.


Regards,


Jan Schramp
--
Jan Schramp, Haagse Hogeschool - Sector Informatica. Louis Couperusplein 19,
2514 HP Den Haag, Netherlands. E-mail: jan@si.hhs.nl
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.