Re: Error reporting, was Infinite look ahead required by C++?

Chris F Clark <>
Sun, 28 Mar 2010 20:05:04 -0400

          From comp.compilers

Related articles
[11 earlier articles]
Re: Error reporting, was Infinite look ahead required by C++? (Joel E. Denny) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? (Chris F Clark) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? (Chris F Clark) (2010-02-19)
Re: Error reporting, was Infinite look ahead required by C++? (Joel E. Denny) (2010-02-21)
Re: Error reporting, was Infinite look ahead required by C++? (Chris F Clark) (2010-02-28)
Re: Error reporting, was Infinite look ahead required by C++? (Joel E. Denny) (2010-03-21)
Re: Error reporting, was Infinite look ahead required by C++? (Chris F Clark) (2010-03-28)
PL/I, was Re: Error reporting, was Infinite look ahead required by C++ (Robert A Duff) (2010-04-01)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: Sun, 28 Mar 2010 20:05:04 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 10-02-024 10-02-029 10-02-047 10-02-055 10-02-062 10-02-064 10-02-070 10-02-072 10-02-078 10-02-080 10-03-002 10-03-069
Keywords: errors, bison
Posted-Date: 01 Apr 2010 12:30:57 EDT

"Joel E. Denny" <> writes:

>> In a typical grammar, you will find states where there is only one
>> possible non-terminal to reduce to. In those states, splitting will
>> not reduce the number of conflicts and is thus pointless.
> You say "only one possible non-terminal", but there could be a conflict
> between two reductions that have the same nonterminal on the LHS but that
> have a different RHS.

Valid point. You should correct my descirption to be one non-terminal
with one acceptable RHS. I suspect two RHS's with the same length are
probably ok also.

>> Thus, for a minimal state algorithm there may still be some spurious
>> reduces when an error is detected. You could adjust the algorithm
>> to avoid those reduces also and I believe you would then get the
>> canonical LR(1) tables.
> Thanks for confirming.
>> Moveover, the work we did was not particularly, groundbreaking.
>> We implemented the essential "display_expected()" function that
>> returns for a given state the tokens (and if desired non-terminals)
>> that are not error actions in a given state.
> That kind of function seems to be the normal starting point whenever I see
> discussions of this issue, and I agree it's fairly obvious. However, how
> exactly do you use it? Do you simply provide it for the grammar author to
> invoke from semantic actions? Or do your generated parsers invoke it
> automatically to prevent extra reductions before a syntax error is
> detected?

We do the obvious simple things with it. We invoke it as part of the
error reporting. We let grammar writers call it. However, we don't
try to use it to prevent reductions.

In theory, a parser could override the reduction function--it is a
member function within the parser class and use it to perform checks
to see if the reduction was going to actually result in a useful shift
afterward. However, that is a relatively non-trivial operation, and
probably beyond the ability of most of our users.

The only good news in that regard is that the parser framework
(i.e. the code) is entirely in the library (i.e. no code is written by
the generator, aside from a derived class with initializers) such that
it could be done once, and used for many grammars.

>> Thus, it was natural and important that the "display_expected()"
>> function be able to skip reporting all the context dependent
>> keywords that were legal simply because an identifier was legal in
>> that place.
> I'm not quite following this point. Did you mean to say "context
> independent keywords"? That is, I can understand why your scanner would
> treat a reserved keyword (context independent) as being legal wherever
> only an identifier is actually legal, and then I assume your parser would
> report a syntax error because the keyword is not actually legal.
> However, I don't understand why you would treat a non-reserved keyword
> (context dependent) that way, and so I don't understand why
> display_expected would need to treat it differently than any other illegal
> token.

An example, might help here. The feature comes from PL/I. In PL/I,
the language has many keywords, but [almost] none of them are
reserved, but in their specific contexts they are meaningful. So, if
you've see the keyword "if" and an expression, the keyword "then" is
legal and interesting.

The way we recommend attacking that problem is to have both an
IDENTIFIER token, returned by the lexer whenever an identifier that
isn't a keyword is matched, and an identifer non-terminal used in the
parser when an identifier is expected. The lexer then returns a THEN
token for when a then keyword is matched. The identifier non-terminal
uses a rule that looks like this:

            identifier : IDENTIFIER | THEN | ELSE | IF | WHILE | DO | ... ;

In the simple case, you can be in a context where an identifier is not
legal, but one of the keywords is, you will get the correct display
expected. The default code looks something like this:

        display_expected() {
                for (int ix = first_token; ix <= last_token; ++ix ) {
                      if (valid_in_current_state(ix)) display_token(ix);
        } }

In the slightly more complicated case where an identifier is legal,
you don't generally want to list all the keywords, so if identifier is
legal, you suppress reporting of the keywords as legal
alternatives. That requires the user to customize the
display_expected() function.

        display_expected() {
                for (int ix = first_token; ix <= last_token; ++ix ) {
                      if (! valid_in_current_state(ix)) continue;
                      if (is_keyword_token(ix) &&
                              valid_in_current_state(IDENTIFIER)) continue;
        } }

However, if you want to print out keywords that are meaningful as
non-idnetifiers in the current state, you want an even more comple

        display_expected() {
                for (int ix = first_token; ix <= last_token; ++ix ) {
                      if (! valid_in_current_state(ix)) continue;
                      if (is_keyword_token(ix) &&
                              valid_in_current_state(IDENTIFIER) ==
                                      valid_in_current_state(ix)) continue;
        } }

This version compares what the parser will do upon seeing the keyword
to what an identifer does, if it does the same thing as an identifier
does, the keyword is just being treated as an identifier, and should
not be reported. However, if the keyword does something different, it
has a special meaning in this state, and should be reported as being
expected, so that the user has a clue to that special use here.

> Maybe industry will have some job for us then.

:-) Well, so far through-out my career, I've found being a compiler
person a good way of getting jobs. One has to be a little bit of a
renaissance person to be a compiler writer. At the same time, I've
found a lot of my jobs haven't been actual compiler writing. Still
one can find some "compiler writing" technique applicable to many
non-compiler writing jobs. Of course, the same could probably be said
for knowing how to write an OS, or build a database, etc.

Hope this helps,

Chris Clark email:
Compiler Resources, Inc. Web Site:
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.