Re: Grammar LR(1) but not LALR(1)

Chris F Clark <cfc@shell01.TheWorld.com>
2 Nov 2003 14:52:49 -0500

From comp.compilers

Related articles
Grammar LR(1) but not LALR(1) jm@bourguet.org (Jean-Marc Bourguet) (2003-10-31)
Re: Grammar LR(1) but not LALR(1) oliver@zeigermann.de (Oliver Zeigermann) (2003-11-02)
*Re: Grammar LR(1) but not LALR(1) cfc@shell01.TheWorld.com (Chris F Clark)* (2003-11-02)**
Re: Grammar LR(1) but not LALR(1) stefan@infoiasi.ro (ANDREI Stefan) (2003-11-08)
Re: Grammar LR(1) but not LALR(1) jm@bourguet.org (Jean-Marc Bourguet) (2003-11-08)
Re: Grammar LR(1) but not LALR(1) RLake@oxfam.org.uk (2003-11-11)

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	2 Nov 2003 14:52:49 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	03-10-139
Keywords:	LALR, parse, LR(1)
Posted-Date:	02 Nov 2003 14:52:49 EST

Jean-Marc asked:
> I'm convinced that such a transformation (contructing an LALR(1)
> grammar by duplicating productions) is possible for every LR(1)
> grammars, such duplication preventing the merge of the states LR(1)
> automaton.
> . . .
> Could someone point me to a counter example for my claim or a proof of
> its correctness?

As far as I know, you are on the correct path. The difference between
the LR and the LALR algorithms is exactly in the merging of states
that eliminate the LALR version's ability to use left-context in
disambiguating contexts. Thus, by duplicating productions and giving
them distinct names which prevent that merging, you are by hand
reintroducing the left-context.

My intuition matches yours in that it should always be possible to do
so. However, there is a potential gotcha. Let me try to outline it.
It may help you form a counter example (or prove that one cannot
exist).

Note that each time one makes this disambiguation, one introduces new
rule names into the referencing rules. If those rules themselves are
recursive, that may introduce new conflicts, as the recursive
fomulation may not have enough context to resolve the conflict without
the trailing (right) context. Let's look at a variation on your
example:

        S -> aEa | bEb | aFb | bFa
        E -> e | Ee
        F -> e | Ff

This grammar should be LR(1), because after the "ae" or "be" prefix,
the next character determines whether we should reduce an E or F
non-terminal. If you make the split grammar, the property still holds
and you have proper LALR left-context disambiguation forced.

        S -> a E a | b E' a | a F b | b F' a
        E -> e | Ee
        E' -> e | E'e
        F -> e | Ff
        F' -> e | F'f

If you try to push the problem farther into the lookahead, you make
the grammar non-LR(1), as in:

        S -> aEa | bEb | aFb | bFa
        E -> e | Ee // not LR(1), because both E & F have "ae/e" &
        F -> e | Fe // "be/e" as prefix/lookahead for a reduction

I think from the above, you can argue that no left recursion can cause
a problem. If there is left recursion, it must have unique trailing
(right) context or the grammar is not LR(1).

I think you can make a similar argument for right recursion, since
right recursion never introduces trailing context. The question with
right recursion, is does a recursive rule somehow reuqire "recursive
splitting". I don't think it does, I think the grammar always
requires a fixed amout of splitting to fix the problem. For example,
this right recursive variant requires no more splitting than the
original grammar.

        S -> aEa | bEb | aFb | bFa
        E -> e | eE
        F -> e | eF

For the splitting to fail, one would need a grammar where it was
important that the rules being split must ultimately reduce to the
same symbol (and not to two unique symbols). Now, thinking that way,
it would seem then that the context problem would have to have a
recursion on the start symbol. How about something like:

        S -> aSa | bSb | aRb | bRa | e
        R -> e

I think we can split it into:

        S -> aSa | bS'b | aRb | bR'a | e
        S' -> aSa | bS'b | aRb | bR'a | e
        R -> e
        R' -> e

While this is far from a formal proof, it does seem convincing that
the splitting of rules is sound and that it should be possible to take
an LR generator (especially one that works by state splitting to
resolve conflicts) and use it to transform an LR grammar into an LALR
one. (And now I'm ready to stand with egg-on-my-face when someone
references a simple proof that such a transofrmation is not possible,
showing exactly what problem I did not cover.)

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
19 Bronte Way #33M voice : (508) 435-5016
Marlboro, MA 01752 USA fax : (508) 251-2347 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Grammar LR(1) but not LALR(1)

Chris F Clark <cfc@shell01.TheWorld.com>2 Nov 2003 14:52:49 -0500

Chris F Clark <cfc@shell01.TheWorld.com>
2 Nov 2003 14:52:49 -0500