Re: Why LL(1) Parsers do not support left recursion?

"Arthur J. O'Dwyer" <ajonospam@andrew.cmu.edu>
25 Jul 2006 18:03:01 -0400

          From comp.compilers

Related articles
[13 earlier articles]
Re: Why LL(1) Parsers do not support left recursion? max@gustavus.edu (Max Hailperin) (2006-07-23)
Re: Why LL(1) Parsers do not support left recursion? cbarron413@adelphia.net (Carl Barron) (2006-07-24)
Re: Why LL(1) Parsers do not support left recursion? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-07-25)
Re: Why LL(1) Parsers do not support left recursion? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-07-25)
Re: Why LL(1) Parsers do not support left recursion? mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-07-25)
Re: Why LL(1) Parsers do not support left recursion? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-07-25)
Re: Why LL(1) Parsers do not support left recursion? ajonospam@andrew.cmu.edu (Arthur J. O'Dwyer) (2006-07-25)
Re: Why LL(1) Parsers do not support left recursion? wyrmwif@tsoft.org (SM Ryan) (2006-07-28)
Re: Why LL(1) Parsers do not support left recursion? cfc@shell01.TheWorld.com (Chris F Clark) (2006-07-28)
Re: Why LL(1) Parsers do not support left recursion? cfc@shell01.TheWorld.com (Chris F Clark) (2006-07-28)
Re: Why LL(1) Parsers do not support left recursion? cfc@shell01.TheWorld.com (Chris F Clark) (2006-07-28)
Re: Why LL(1) Parsers do not support left recursion? find@my.address.elsewhere (Matthias Blume) (2006-07-28)
Re: Why LL(1) Parsers do not support left recursion? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2006-07-28)
[13 later articles]
| List of all articles for this month |

From: "Arthur J. O'Dwyer" <ajonospam@andrew.cmu.edu>
Newsgroups: comp.compilers
Date: 25 Jul 2006 18:03:01 -0400
Organization: Carnegie Mellon, Pittsburgh, PA
References: 06-07-059 06-07-065 06-07-071
Keywords: parse
Posted-Date: 25 Jul 2006 18:03:00 EDT

On Tue, 25 Jul 2006, Hans-Peter Diettrich wrote:
> SM Ryan schrieb:
[Hans-Peter wrote:]
>> # 2. Most languages are ambiguous at the syntax level, so that implicit
>> # disambiguation (longest match...) or explicit semantical constraints
>> # must be introduced. (see: dangling else...).
>>
>> Only poorly designed programming languages are ambiguous. (Natural
>> languages are ambiguous and don't use deterministic language theory.)
>
> Counterexample:
> The argument list of a subroutine is nothing but a list, which can be
> expressed using either left or right recursion in a grammar.


      That's a true statement, but I don't see what it's a "counterexample"
to.


> Perhaps I misused the term "ambiguous" here, when I meant that different
> parse trees can be constructed for a sentence of a language?


      Well, that's obvious. I mean, look at the following sentence:
"The dog chases the cat." Now, we can parse that into the tree


          chases dog
          / \ or we can make the tree / \
      dog cat The the
      / / / \
    The the chases cat


Obviously, one of those parse trees is more "natural", and, viewed
through human eyes, expresses something useful about the structure
of the sentence (namely, that a sentence has a verb, and nouns are
more important than articles, and so on).
      However, we could probably write one grammar to make the parse tree
on the left, and another grammar to make the parse tree on the right.
Does this mean that something is "wrong" with English, or that the
sentence "The dog chases the cat" admits of "more than one possible
parse tree"? No, not in any useful sense.


      In the same way, it's possible to create many different "parse trees"
for a C-language "sentence", indicated here with indentation levels:


          if (x) if (x) if (x)
              if (y) if (y) if (y)
                  foo(); foo(); foo();
              else else else
                  bar(); bar(); bar();


Only the leftmost parse tree is useful in understanding the semantics
of the C program; therefore, we call it "correct", and design our
grammars to create this parse tree in preference to the other two.
      The middle parse tree demonstrates the "dangling else" non-problem.
It is a /wrong/ parse tree. No correct grammar for C will generate it.
      The rightmost parse tree demonstrates the "can't distinguish between
'if' and 'foo'" non-problem. It is also a /wrong/ parse tree; wrong
enough that it is instantly recognizable as nonsensical to human eyes.
Again, no correct grammar for C will generate it.


      There is absolutely no qualitative difference between the wrongness
of the middle parse tree and the wrongness of the rightmost parse tree.




      In the same way, C has this kind of false "ambiguity" in its lexer.
The string "+++" parses as ++, then +, even though a human might think
it could parse as +, then ++. But any grammar that would let "+++" parse
that way would be a /wrong/ grammar, and wouldn't qualify as a grammar
for C.




>> Many programming language grammars are ambiguous because the people
>> writing the grammars are lazy and/or uneducated in these matters. The
>> dangling else nonproblem was solved some 40 years ago. Anyone who
>> thinks this is still a problem is just too lazy to write a well known
>> solution.
>
> The dangling else problem can be solved by adding implicit general rules
> to the interpretation of a language (or grammar?). Of course there exist
> ways to prevent the occurence of such problems, just in the language
> specification. But AFAIR it's impossible to prove, in the general case,
> that a language is inambigous.


      The sentence above is ambiguous. :) If you mean "there's no general
algorithm that will take a grammar G and decide whether it is ambiguous",
I think you're right, just because that seems like it ought to be an
NP-complete problem.
      However, if you mean "nobody has ever proven that any grammar is
unambiguous", you're wrong. It's easy to prove that /some/ specific
grammars are unambiguous. I'm sure all the major programming languages'
reference grammars have been vetted for correctness by now, for example.




      Those are my main points. Following are nitpicks.


> My point is that table driven parsers are unreadable, due to the lost
> relationship between the parser code and the implemented language or
> grammar. Do there exist LR parsers or parser generators at all, which
> preserve the relationship between the parser code and the grammar?


      Again, you seem to confuse human-centered subjective terms like
"preserve the relationship" with measurable terms. Obviously there is
/some/ kind of special relationship between a yacc-generated parser
and its target language --- something that makes it parse C and not
Java or pig Latin. You're complaining that the relationship isn't
close enough to the surface for humans to see --- but why is that
important?
      After all, the point of table-driven parsers is that humans never
/need/ to see the generated code. It "just works."




[SM Ryan wrote:]
>> A language definition that does not depend on vague handwaving (one or
>> two such definitions actually exist) bases the semantics on the parse
>> tree. Since right and left recursion build different parse trees, this
>> issue is very important in definitions with riguous semantics.


      I see where you're coming from, but that's slightly too dogmatic IMHO.
What would you say to a language definition that defined a comma-separated
list as "a list of N expressions, separated by commas" (or some such)?
That's unambiguous, but doesn't imply left- or right-recursion, or in
fact any recursion at all.


[...]
> No doubt, but IMO you want to introduce more restrictions than required.
> A compiler is allowed to apply certain *valid* transformations on an
> parse tree, in so far I cannot see a reason why any grammar or parser
> for that language must enforce the construction of one-and-only-one
> valid parse tree.


      Certainly any language must ensure that all interpretations of a given
sentence /mean the same thing/. The easiest way to do that is to give
an unambiguous grammar, so that there's only one interpretation of each
sentence in the language. You could do weirder things, but AFAIK pretty
much nobody's ever tried, because the easy way is so much easier. :)


-Arthur


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.