9 Sep 2003 23:02:53 -0400

Related articles |
---|

(long) conflict resolution in action table newsham@lava.net (Tim Newsham) (2003-09-04) |

Re: (long) conflict resolution in action table derkgwen@HotPOP.com (Derk Gwen) (2003-09-09) |

Re: (long) conflict resolution in action table haberg@matematik.su.se (2003-09-09) |

Re: (long) conflict resolution in action table newsham@lava.net (Tim Newsham) (2003-09-09) |

Re: (long) conflict resolution in action table newsham@lava.net (Tim Newsham) (2003-09-14) |

From: | Tim Newsham <newsham@lava.net> |

Newsgroups: | comp.compilers |

Date: | 9 Sep 2003 23:02:53 -0400 |

Organization: | Compilers Central |

References: | 03-09-014 |

Keywords: | LALR, parse |

Posted-Date: | 09 Sep 2003 23:02:52 EDT |

I put the same question to Dr. Visser (the author of the papers i

quoted) and he sent me the following reply. It turns out I had

omitted the proper computation of the GOTO() table which becomes a

function of state and production rather than state and symbol [when

performing a reduction, the next state is looked up based on which

production was reduced, rather than the symbol being reduced to].

Tim N.

----[ text from Dr. Visser follows ]----

Hi Tim,

[I quote your entire mail for cc'ed people]

*> I recently read through your paper on scannerless parsing and (skimmed*

*> over) your previous paper on disambiguating earley and lr parsers.*

*> I have a few questions reguarding disambiguating lr parsers. The*

*> first question I have already posted to comp.compilers (awaiting moderation)*

*> but I thought I'd send mail directly in case you dont follow that*

*> forum. It's attached at the end.*

*>*

*> Since posting that, another question came to mind:*

*>*

*> - Is it possible to use the ">", "right", "left" and "nonassoc"*

*> properties to disambiguate the dangling else problem?*

*>*

*> S -> if E then S else S*

*> | if E then S*

*> | s*

*> ;*

*>*

*> I dont see an obvious way to do this, although I know how to*

*> manually disambiguate this by prefering a shift over a reduce.*

*> With the disambiguating relations defined in your paper, I*

*> can prevent certain parse trees from being used, but I cannot*

*> prefer some parse trees over others when the parse is ambiguous.*

No, you cannot express this using direct priorities. Have a look at the

paper

M. van den Brand, J. Scheerder, J. Vinju, and E. Visser. Disambiguation

filters for scannerless generalized LR parsers. In N. Horspool, editor,

Compiler Construction (CC'02), volume 2304 of Lecture Notes in Computer

Science, pages 143--158, Grenoble, France, April 2002. Springer-Verlag.

which introduces another disambiguation mechanisms, i.e., 'prefer' and

'avoid', which does deal with this problem.

There is another shortcoming of the priority scheme as described in the

papers you refer to. In some situations it is desirable to apply a

priority to *some* of the arguments of a production. For instance,

with the ternary operator

E -> E "<" E ">" E

it is not necessary that the priorities apply to the middle expression.

This is only a shortcoming in the language of priorities (SDF). The

general priority scheme can cope with these situations. That is, one

wants to exclude the patterns

E -> [E -> E + E] < E > E

E -> E < E > [E -> E + E]

but not

E -> E < [E -> E + E] > E

*> I recently implemented your disambiguating rules replacing a previous*

*> precedence system that just favored some actions over others when*

*> a conflict occurs and I'm trying to grasp the pros and cons of both*

*> approaches (obviously your approach is much more rigorous).*

Indeed I'm not subscribed to comp.compilers. You can forward my answer

to the group if you care to.

*> ----[ Comp.compilers question attached:*

*>*

*> Hi,*

*> semi-long question and comment...*

*>*

*> OVERVIEW:*

*>*

*> I have a comment about using precedence to disambiguate the*

*> actions in a shift-reduce parser. I've found two approaches so*

*> far:*

*>*

*> - Give priorities and associativities to terminals. When*

*> encountering a shift-reduce conflict, give the shift the*

*> precedence of the symbol to be shifted, and the reduce the*

*> precedence of the highest precedence symbol in the right*

*> hand side of the production to be reduced. Pick the action*

*> with the higher precedence.*

*>*

*> - Use a set of conflict constraints when generating the LR0*

*> goto table and the follow set of each production (note:*

*> each production has its own followset rather than computing*

*> followsets of symbols). This method is outlined by Visser*

*> in the following two papers:*

*>*

*> @misc{ visser-scannerless,*

*> author = "E. Visser",*

*> title = "Scannerless generalized-LR parsing",*

*> text = "Visser, E. (1997b). Scannerless generalized-LR parsing. Technical Report*

*> P9707, Programming Research Group, University of Amsterdam.",*

*> url = "citeseer.nj.nec.com/visser97scannerles.html" }*

*>*

*> @misc{ visser-case,*

*> author = "E. Visser",*

*> title = "A case study in optimizing parsing schemata by disambiguation filters",*

*> text = "E. Visser. A case study in optimizing parsing schemata by disambiguation*

*> filters.",*

*> url = "citeseer.nj.nec.com/164096.html" }*

*>*

*> I'd be interested in hearing of other approaches.*

*>*

*>*

*>*

*>*

*> BACKGROUND:*

*>*

*> Anyway, I started to use Visser's method of disambiguation. Consider*

*> the following example:*

*>*

*> E -> E + E {1}*

*> | E * E {2}*

*> | id {3}*

*> ;*

*>*

*> with relationships:*

*> {2} > {1}*

*> {1} left {1}*

*> {2} left {2}*

*>*

*> the disambiguation rules mean that the following subtrees contain*

*> conflicts and should not be generated:*

*>*

*> E -> [E -> E + E] * E because {2} > {1}*

*> E -> E * [E -> E + E] because {2} > {1}*

*> E -> E + [E -> E + E] because {1} left {1}*

*> E -> E * [E -> E * E] because {2} left {2}*

*>*

*> During the construction of the LR0 goto table, these restrictions*

*> prevent some predictions that might normally occur when computing*

*> the closure of the item sets, ie:*

*>*

*> E -> . E * E does not predict E -> . E + E*

*> E -> E * . E does not predict E -> . E + E*

*> E -> E + . E does not predict E -> . E + E*

*> E -> E * . E does not predict E -> . E * E*

*>*

*> which has the net effect of eliminating some of the shift operations*

*> that otherwise would have been possible at a later point. During*

*> computation of the follows sets of each production these restrictions*

*> prevent some symbols from being part of the follows set. For example*

*> the production:*

*>*

*> E -> E + E follows = { EOF, + }*

*>*

*> never receives a '*' in its follows set because the only time '*' could*

*> follow the production:*

*>*

*> E -> [E -> E + E] * E*

*>*

*> cannot occur.*

*>*

*> So with these disambiguation rules, there are no conflicts in the SLR*

*> action table for the grammar.*

*>*

*>*

*>*

*> THE PROBLEM*

*>*

*> That's straightforward enough. Now my comment/complaint/question.*

*> Consider when you add one more production to the grammar but do not*

*> specify its precedence or associativity:*

*>*

*> E -> E + E {1}*

*> | E * E {2}*

*> | id {3}*

*> | E ^ E {4}*

*> ;*

*>*

*> with relationships:*

*> {2} > {1}*

*> {1} left {1}*

*> {2} left {2}*

My first reaction with such questions is to look at what the

implementation does. A parser and parser generator for SDF

are available at

http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/SGLR

http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ParsetableGenerator

I translated your example into SDF

--------------------------------------------

definition

module Main

exports

sorts E

lexical syntax

[a-z]+ -> Id

[\ \t\n] -> LAYOUT

context-free syntax

Id -> E

E "+" E -> E {left,cons("Add")}

E "*" E -> E {left,cons("Mul")}

E "^" E -> E {cons("Exp")}

context-free priorities

E "*" E -> E {left,cons("Mul")}

> E "+" E -> E {left,cons("Add")}

--------------------------------------------

and ran a few experiments. Sure enough if you write expressions using ^

trees with ambiguities result. However, parsing expressions with only

* and + are parsed unambiguously.

*> Now when we compute follow sets we get:*

*>*

*> E -> E ^ E follows = { EOF, +, *, ^ }*

*>*

*> because there are no restrictions on this production. This leads to*

*>*

*> E -> E + E follows = { EOF, +, *, ^ }*

*>*

*> because*

*>*

*> E -> E + [E -> E ^ E]*

*>*

*> is allowed, and any values that follow [E -> E ^ E] may also follow*

*> [E -> E + E].*

This analysis is indeed correct.

*> The end effect is that the resulting action tables will have many*

*> shift/reduce conflicts that the precedence rules imply should have*

*> been elimianted. As an example consider the following itemset and*

*> actions for one of the states in the SLR parser:*

*>*

*> State 7:*

*> E -> E + E . follows: ['#EOF#', '*', '+', '^']*

*> E -> E . + E*

*> E -> E . * E*

*> E -> E . ^ E*

*> on #EOF#: reduce E -> E + E*

*> *** on *: shift 3*

*> *** on *: reduce E -> E + E*

*> *** on +: shift 4*

*> *** on +: reduce E -> E + E*

*> *** on ^: shift 5*

*> *** on ^: reduce E -> E + E*

Indeed, but there is more to the algorithm than filtering with

priorities on followsets. Priorities are also applied during the

computation of the goto table. The second paper that you quote explains

this more graphically than the first using pictures of the goto graph.

This boils down to the following:

Even if you can do the reduction with E -> E + E because of the more

liberal follow set, you will end up in a state in which you cannot

shift with * because that would lead to conflict.

So what you get is an LR table which indeed contains some conflicts,

but these are resolved after the reduction.

The approach does require a parsing algorithm which can deal with

a bit of non-determinism. The SGLR algorithm we use for SDF evaluates

both possibilities in parallel, discarding branches which fail.

It might be possible to get a better result for the follow sets.

However, in the case of SDF/SGLR this is not possible because of

scannerless parsing; an arbitrary amount of whitespace may be

between the E . and the operator. That is productions usually have

the form

E -> E L + L E

where L stands for Layout, i.e., whitespace and comments.

*> If more traditional conflict resolution were used, we should be*

*> able to resolve the action when the "*" lookahead is seen in*

*> favor of the shift since a shift of a "*" would have a higher*

*> precedence than a reduce of a production with a "+".*

*>*

*> My comment: Visser's method seems very elegant for completely*

*> disambiguating conflicts in a set of productions, but if you wish*

*> to partially disambiguate the productions, it may not do as much*

*> as you would hope.*

*>*

*> My question: Is there anyway to augment Visser's conflict resolution*

*> so that it can be more effective when resolving a subset of the*

*> conflicts that occur? For example, is there a method that could*

*> produce a parser for my simple expression grammar that only encountered*

*> a ambiguity when the "^" operator was present in an input sentance?*

Yes, I suspect there is. Rather then computing the follow sets as

specified, it may actually suffice to look at the goto relation:

For the follow set of the reduction of [E -> E + E .] above you need

to look at the state where the production is predicted, i.e.,

containing [E -> . E + E]. Now look at the item set following from

doing a reduction in that set; all tokens that are directly predicted

in that set can be in the follow set of the reduction. Then you will

see that the set contains [E -> E . + E] and [E -> E . ^ E], but not

[E -> E . * E] since [E -> [E -> E + E] * E] has a conflict. Thus,

only "+" and "*" are in the follow set of [E -> E + E].

This may actually simplify the parser generator since no separate

computation of first and follow set is necessary. But I may be

overlooking something.

cheers,

-- Eelco

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.