Re: Error reporting, was Infinite look ahead required by C++?

Chris F Clark <cfc@shell01.TheWorld.com>
Fri, 19 Feb 2010 20:48:59 -0500

          From comp.compilers

Related articles
[6 earlier articles]
Re: Error reporting, was Infinite look ahead required by C++? idbaxter@semdesigns.com (Ira Baxter) (2010-02-15)
Re: Error reporting, was Infinite look ahead required by C++? haberg_20080406@math.su.se (Hans Aberg) (2010-02-16)
Re: Error reporting, was Infinite look ahead required by C++? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-17)
Re: Error reporting, was Infinite look ahead required by C++? kkylheku@gmail.com (Kaz Kylheku) (2010-02-17)
Re: Error reporting, was Infinite look ahead required by C++? haberg_20080406@math.su.se (Hans Aberg) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? jdenny@clemson.edu (Joel E. Denny) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? jdenny@clemson.edu (Joel E. Denny) (2010-02-21)
Re: Error reporting, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-28)
Re: Error reporting, was Infinite look ahead required by C++? jdenny@clemson.edu (Joel E. Denny) (2010-03-21)
Re: Error reporting, was Infinite look ahead required by C++? cfc@shell01.TheWorld.com (Chris F Clark) (2010-03-28)
PL/I, was Re: Error reporting, was Infinite look ahead required by C++ bobduff@shell01.TheWorld.com (Robert A Duff) (2010-04-01)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Fri, 19 Feb 2010 20:48:59 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 10-02-024 10-02-029 10-02-047 10-02-055 10-02-062 10-02-064 10-02-070 10-02-072
Keywords: parse, errors
Posted-Date: 21 Feb 2010 00:20:49 EST

Stephen Horne <sh006d3592@blueyonder.co.uk> writes:


> On Tue, 16 Feb 2010 18:37:39 +0100, Hans Aberg
> <haberg_20080406@math.su.se> wrote:
>
>>Stephen Horne wrote:
>>> In LR(1), it is *easy* to give a message of the form "expected one of
>>> <token list>, but <token> was found." - the set of possible next
>>> tokens is part of the description of each state, and this can easily
>>> be traced back to a set of productions (or else the parser could never
>>> know when/what to reduce).
>>>
>>> Yacc and Bison don't support reporting errors in this form AFAIK, but
>>> the tool isn't the same as the algorithm the tool uses.
>>
>>Those use LALR(1), though LR(1) support may be in the works for Bison,
>
> Just making sure I haven't misunderstood...
>
> LALR(1) is basically a minor variation of LR(1). The derivation of the
> state model is a little different, but each state still has a set of
> valid next tokens, used to choose between shifting and reducing.
>
>>which compacts the states in such a way that when an error token
>>appears, some extra reductions may take place before issuing the error.
>
> This sounds like table compression modified by some form of
> preprocessing of the state model, but it doesn't sound like an
> inherent part of LALR, but rather an optimisation specific to Bison?


There are two possible causes of the difference. One is due to
differences in LALR (or SLR) processing versus LR, and that is
endemic. The second one is due to a common table packing optimization
and how "don't cares" are handled, and that is more "tool" specific,
but it is a general algorithm, so it isn't specific to just Yacc or
Bison, but many tools--it can in theory be applied to an LR tool even.


Let's deal with the tool specific one first, as that's what's most
likely being referenced.


If you have a state, where the actions for certin input (tokens) are
shifts and others are reduces, and still others are errors. The
parser will still work if you replace the error actions with a reduce
action. It will simply reduce to a new state where the action will be
an error action (or a reduce action that has the same property,
i.e. you may follow a chain of reduces, but you will eventually come
to an error action). You will never follow a chain of reduces
replacing an error that lead you to a shift or an accept action. It's
fairly easy to prove that this is true, when you know that reduce
actions in a state are added because some parent state has a shift (or
accept) action on that token.


The only disadvantage is that you've lost the inner most context when
reporting the error. Instead of reporting an error in a
"then-clause", the chain of reduces may lead you to claim the error
was not only for a "if-statement", but a "block".


Note, all LR grammars have this chain property, so this optimization
can be applied to any LR grammar. It is just most commonly applied in
LALR parsers, because it has some nice side-effects when used with the
Yacc comb-vector packing algorithm.


A related optimization issue occurs in parser generators that do
shift-reduce optimization, which has the same effect. I think unit
rule elimination can have the same effect, but I'm not certain.


The issue is most serious if the parser is trying to do error-repair.
In that case, you have to undo the spurious reduces before doing the
repair, or you can still misparse things that have been repaired.


Now, having understood this issue, it is easy to understand the
grammar class specific issue. The key difference between LR, LALR,
and SLR parsing tables is which reduce actions are applied in which
state.


In the SLR case, one set of reduce actions is associated with each
non-terminal and applied in all states. Thus, in some states, we may
be reducing when in this context we should have applied an error.
Same issue as above. Note, that you will still follow a series of
reduce chains and come to an error, but you are still in a parent
non-terminal.


In LALR, we look at a state and work out what contexts that
non-terminal appears and take the reduce actions from thos contexts.
However, the LALR algorithm merges states without regard to parent
left-contexts, which can again cause the same overly agressive
reduction result.


A true LR algorithm does not merge left-contexts (that affect
reductions--the canonical LR algorithm never merges, and minimal state
ones marge and then re-split (ala Spector) or use lane tracing (ala
Pager) before merging to avoid problematic merges) and thus each state
has only the lookaheads that are appropriate to it. Thus, an LR
algorithm doesn't have the issue unless one "optimizes" the states as
described above.


Again, the issue is most important when one is trying to repair and
not simply report the error. If you don't repair the error in the
inner most context, the repair can trigger spurious cascading errors.
For example, deleting a spurious semi-colon may make the followiing
tokens legitimately part of the expression that was erroneously
terminated. If you don't un-reduce that reduction of an expression,
the tokens can now be incorrectly parsed as something else (including
cascading errors).


I don't know whether LL gramars have a corresponding problem. The
issue is that some illegal token can cause one to pop the stack, when
one really shouldn't, i.e. the current rule could (but shouldn't) end
before this invalid token, but by the time you report the error the
inner most context is lost.


Hope this helps,
-Chris


******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.