Error reporting in LR parsers

worley@compass.com (Dale Worley)
Wed, 2 Aug 89 10:51:51 EDT

          From comp.compilers

Related articles
Error reporting in LR parsers worley@compass.com (1989-08-02)
Re: Error reporting in LR parsers djones@megatest.uucp (1989-08-04)
Re: Error reporting in LR parsers lai@mips.com (1989-08-07)
Re: Error reporting in LR parsers heirich@cs.ucsd.edu (1989-08-08)
Re: Error reporting in LR parsers heirich@cs.ucsd.edu (1989-08-08)
Re: Error reporting in LR parsers rusty@garnet.Berkeley.EDU (1989-08-10)
Re: Error reporting in LR parsers djones@megatest.com (1989-08-10)
[4 later articles]
| List of all articles for this month |

Date: Wed, 2 Aug 89 10:51:51 EDT
From: worley@compass.com (Dale Worley)

      The idea of adding a 'yyexpected' function might sound good at
      first, but there are a number of problems with it. As was pointed out
      before, the number of expected terminals will probably be quite large,
      as this will include terminals on which a shift is possible and
      terminals on which a reduction is possble. For a grammar for Pascal,
      for example, this could be 30 or more terminals in some spots. Also,
      you would probably have to modify the Yacc source code to do this
      because Yacc inserts default reduction actions. These default
      reduction actions eliminate the lookahead information associated with
      reductions.


I've used similar techniques, and a few simple fixes can solve a lot
of these problems. The first is to define "token groups", which are
groups of tokens the user thinks of as being similar. For instance,
'+', '-', '*', etc. can be included in the group 'operator'; '(',
'IDENTIFIER', '-', etc., can be included in the group 'expression'.
Now, when a syntax error is found, the error-reporter assembles a list
of acceptable tokens. Then it checks each token group, and if the
group is contained in the list of acceptable tokens, all those tokens
are removed from the list, and the name of the token group is added.
(In practice, only about three members of the group need to be checked
for membership, and if one token group overlaps another, you have to
be careful about the order in which the groups are tested.) The
result is that you can usually reduce the list of allowed tokens to
about three items, and produce error messages like:


34 i = j k;
^
        Syntax error.
        Found: IDENTIFIER 'k'
        Expecting: OPERATOR ; END


The trick of only checking a few members of the group can be used to
defeat the problem of losing information because of default
reductions. The way to do this is to only test members whose validity
will not be eliminated by the default reductions. In the above
example, 'j' may be reduced to 'expression' before the error is found,
but if the 'operator' group is represented only by '+', then the fact
that '*' cannot follow 'expression' will not stop the 'operator' group
from showing in the list of expected tokens.


Dale
--
Dale Worley, Compass, Inc. worley@compass.com
No one hates so bitterly as the lazy and poor contemplating the
energetic and prosperous.





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.