Re: (long) conflict resolution in action table

Tim Newsham <newsham@lava.net>
9 Sep 2003 23:02:53 -0400

From comp.compilers

Related articles
(long) conflict resolution in action table newsham@lava.net (Tim Newsham) (2003-09-04)
Re: (long) conflict resolution in action table derkgwen@HotPOP.com (Derk Gwen) (2003-09-09)
Re: (long) conflict resolution in action table haberg@matematik.su.se (2003-09-09)
*Re: (long) conflict resolution in action table newsham@lava.net (Tim Newsham)* (2003-09-09)**
Re: (long) conflict resolution in action table newsham@lava.net (Tim Newsham) (2003-09-14)

| List of all articles for this month |

From:	Tim Newsham <newsham@lava.net>
Newsgroups:	comp.compilers
Date:	9 Sep 2003 23:02:53 -0400
Organization:	Compilers Central
References:	03-09-014
Keywords:	LALR, parse
Posted-Date:	09 Sep 2003 23:02:52 EDT

I put the same question to Dr. Visser (the author of the papers i
quoted) and he sent me the following reply. It turns out I had
omitted the proper computation of the GOTO() table which becomes a
function of state and production rather than state and symbol [when
performing a reduction, the next state is looked up based on which
production was reduced, rather than the symbol being reduced to].

Tim N.

----[ text from Dr. Visser follows ]----

Hi Tim,

[I quote your entire mail for cc'ed people]

> I recently read through your paper on scannerless parsing and (skimmed
> over) your previous paper on disambiguating earley and lr parsers.
> I have a few questions reguarding disambiguating lr parsers. The
> first question I have already posted to comp.compilers (awaiting moderation)
> but I thought I'd send mail directly in case you dont follow that
> forum. It's attached at the end.
>
> Since posting that, another question came to mind:
>
> - Is it possible to use the ">", "right", "left" and "nonassoc"
> properties to disambiguate the dangling else problem?
>
> S -> if E then S else S
> | if E then S
> | s
> ;
>
> I dont see an obvious way to do this, although I know how to
> manually disambiguate this by prefering a shift over a reduce.
> With the disambiguating relations defined in your paper, I
> can prevent certain parse trees from being used, but I cannot
> prefer some parse trees over others when the parse is ambiguous.

No, you cannot express this using direct priorities. Have a look at the
paper

M. van den Brand, J. Scheerder, J. Vinju, and E. Visser. Disambiguation
filters for scannerless generalized LR parsers. In N. Horspool, editor,
Compiler Construction (CC'02), volume 2304 of Lecture Notes in Computer
Science, pages 143--158, Grenoble, France, April 2002. Springer-Verlag.

which introduces another disambiguation mechanisms, i.e., 'prefer' and
'avoid', which does deal with this problem.

There is another shortcoming of the priority scheme as described in the
papers you refer to. In some situations it is desirable to apply a
priority to *some* of the arguments of a production. For instance,
with the ternary operator

      E -> E "<" E ">" E

it is not necessary that the priorities apply to the middle expression.
This is only a shortcoming in the language of priorities (SDF). The
general priority scheme can cope with these situations. That is, one
wants to exclude the patterns

        E -> [E -> E + E] < E > E
        E -> E < E > [E -> E + E]

but not

        E -> E < [E -> E + E] > E

> I recently implemented your disambiguating rules replacing a previous
> precedence system that just favored some actions over others when
> a conflict occurs and I'm trying to grasp the pros and cons of both
> approaches (obviously your approach is much more rigorous).

Indeed I'm not subscribed to comp.compilers. You can forward my answer
to the group if you care to.

> ----[ Comp.compilers question attached:
>
> Hi,
> semi-long question and comment...
>
> OVERVIEW:
>
> I have a comment about using precedence to disambiguate the
> actions in a shift-reduce parser. I've found two approaches so
> far:
>
> - Give priorities and associativities to terminals. When
> encountering a shift-reduce conflict, give the shift the
> precedence of the symbol to be shifted, and the reduce the
> precedence of the highest precedence symbol in the right
> hand side of the production to be reduced. Pick the action
> with the higher precedence.
>
> - Use a set of conflict constraints when generating the LR0
> goto table and the follow set of each production (note:
> each production has its own followset rather than computing
> followsets of symbols). This method is outlined by Visser
> in the following two papers:
>
> @misc{ visser-scannerless,
> author = "E. Visser",
> title = "Scannerless generalized-LR parsing",
> text = "Visser, E. (1997b). Scannerless generalized-LR parsing. Technical Report
> P9707, Programming Research Group, University of Amsterdam.",
> url = "citeseer.nj.nec.com/visser97scannerles.html" }
>
> @misc{ visser-case,
> author = "E. Visser",
> title = "A case study in optimizing parsing schemata by disambiguation filters",
> text = "E. Visser. A case study in optimizing parsing schemata by disambiguation
> filters.",
> url = "citeseer.nj.nec.com/164096.html" }
>
> I'd be interested in hearing of other approaches.
>
>
>
>
> BACKGROUND:
>
> Anyway, I started to use Visser's method of disambiguation. Consider
> the following example:
>
> E -> E + E {1}
> | E * E {2}
> | id {3}
> ;
>
> with relationships:
> {2} > {1}
> {1} left {1}
> {2} left {2}
>
> the disambiguation rules mean that the following subtrees contain
> conflicts and should not be generated:
>
> E -> [E -> E + E] * E because {2} > {1}
> E -> E * [E -> E + E] because {2} > {1}
> E -> E + [E -> E + E] because {1} left {1}
> E -> E * [E -> E * E] because {2} left {2}
>
> During the construction of the LR0 goto table, these restrictions
> prevent some predictions that might normally occur when computing
> the closure of the item sets, ie:
>
> E -> . E * E does not predict E -> . E + E
> E -> E * . E does not predict E -> . E + E
> E -> E + . E does not predict E -> . E + E
> E -> E * . E does not predict E -> . E * E
>
> which has the net effect of eliminating some of the shift operations
> that otherwise would have been possible at a later point. During
> computation of the follows sets of each production these restrictions
> prevent some symbols from being part of the follows set. For example
> the production:
>
> E -> E + E follows = { EOF, + }
>
> never receives a '*' in its follows set because the only time '*' could
> follow the production:
>
> E -> [E -> E + E] * E
>
> cannot occur.
>
> So with these disambiguation rules, there are no conflicts in the SLR
> action table for the grammar.
>
>
>
> THE PROBLEM
>
> That's straightforward enough. Now my comment/complaint/question.
> Consider when you add one more production to the grammar but do not
> specify its precedence or associativity:
>
> E -> E + E {1}
> | E * E {2}
> | id {3}
> | E ^ E {4}
> ;
>
> with relationships:
> {2} > {1}
> {1} left {1}
> {2} left {2}

My first reaction with such questions is to look at what the
implementation does. A parser and parser generator for SDF
are available at

      http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/SGLR
      http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ParsetableGenerator

I translated your example into SDF

--------------------------------------------
definition

module Main
exports
    sorts E
    lexical syntax
        [a-z]+ -> Id
        [\ \t\n] -> LAYOUT
    context-free syntax
        Id -> E
        E "+" E -> E {left,cons("Add")}
        E "*" E -> E {left,cons("Mul")}
        E "^" E -> E {cons("Exp")}
    context-free priorities
        E "*" E -> E {left,cons("Mul")}
    > E "+" E -> E {left,cons("Add")}
--------------------------------------------

and ran a few experiments. Sure enough if you write expressions using ^
trees with ambiguities result. However, parsing expressions with only
* and + are parsed unambiguously.

> Now when we compute follow sets we get:
>
> E -> E ^ E follows = { EOF, +, *, ^ }
>
> because there are no restrictions on this production. This leads to
>
> E -> E + E follows = { EOF, +, *, ^ }
>
> because
>
> E -> E + [E -> E ^ E]
>
> is allowed, and any values that follow [E -> E ^ E] may also follow
> [E -> E + E].

This analysis is indeed correct.

> The end effect is that the resulting action tables will have many
> shift/reduce conflicts that the precedence rules imply should have
> been elimianted. As an example consider the following itemset and
> actions for one of the states in the SLR parser:
>
> State 7:
> E -> E + E . follows: ['#EOF#', '*', '+', '^']
> E -> E . + E
> E -> E . * E
> E -> E . ^ E
> on #EOF#: reduce E -> E + E
> *** on *: shift 3
> *** on *: reduce E -> E + E
> *** on +: shift 4
> *** on +: reduce E -> E + E
> *** on ^: shift 5
> *** on ^: reduce E -> E + E

Indeed, but there is more to the algorithm than filtering with
priorities on followsets. Priorities are also applied during the
computation of the goto table. The second paper that you quote explains
this more graphically than the first using pictures of the goto graph.
This boils down to the following:

Even if you can do the reduction with E -> E + E because of the more
liberal follow set, you will end up in a state in which you cannot
shift with * because that would lead to conflict.

So what you get is an LR table which indeed contains some conflicts,
but these are resolved after the reduction.

The approach does require a parsing algorithm which can deal with
a bit of non-determinism. The SGLR algorithm we use for SDF evaluates
both possibilities in parallel, discarding branches which fail.

It might be possible to get a better result for the follow sets.
However, in the case of SDF/SGLR this is not possible because of
scannerless parsing; an arbitrary amount of whitespace may be
between the E . and the operator. That is productions usually have
the form

      E -> E L + L E

where L stands for Layout, i.e., whitespace and comments.

> If more traditional conflict resolution were used, we should be
> able to resolve the action when the "*" lookahead is seen in
> favor of the shift since a shift of a "*" would have a higher
> precedence than a reduce of a production with a "+".
>
> My comment: Visser's method seems very elegant for completely
> disambiguating conflicts in a set of productions, but if you wish
> to partially disambiguate the productions, it may not do as much
> as you would hope.
>
> My question: Is there anyway to augment Visser's conflict resolution
> so that it can be more effective when resolving a subset of the
> conflicts that occur? For example, is there a method that could
> produce a parser for my simple expression grammar that only encountered
> a ambiguity when the "^" operator was present in an input sentance?

Yes, I suspect there is. Rather then computing the follow sets as
specified, it may actually suffice to look at the goto relation:
For the follow set of the reduction of [E -> E + E .] above you need
to look at the state where the production is predicted, i.e.,
containing [E -> . E + E]. Now look at the item set following from
doing a reduction in that set; all tokens that are directly predicted
in that set can be in the follow set of the reduction. Then you will
see that the set contains [E -> E . + E] and [E -> E . ^ E], but not
[E -> E . * E] since [E -> [E -> E + E] * E] has a conflict. Thus,
only "+" and "*" are in the follow set of [E -> E + E].

This may actually simplify the parser generator since no separate
computation of first and follow set is necessary. But I may be
overlooking something.

cheers,

-- Eelco

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: (long) conflict resolution in action table

Tim Newsham <newsham@lava.net>9 Sep 2003 23:02:53 -0400

Tim Newsham <newsham@lava.net>
9 Sep 2003 23:02:53 -0400