[Followup] Re: "Near Miss" error handling?

gwyn@thislove.dyndns.org (Gwyn Judd)
10 Apr 2001 01:16:54 -0400

          From comp.compilers

Related articles
"Near Miss" error handling? gwyn@thislove.dyndns.org (2001-03-27)
Re: "Near Miss" error handling? eeide@cs.utah.edu (Eric Eide) (2001-03-31)
Re: "Near Miss" error handling? joachim_d@gmx.de (Joachim Durchholz) (2001-04-04)
[Followup] Re: "Near Miss" error handling? gwyn@thislove.dyndns.org (2001-04-10)
Re: [Followup] Re: "Near Miss" error handling? Scott.Daniels@Acm.Org (Scottie) (2001-04-12)
Re: [Followup] Re: "Near Miss" error handling? gwyn@thislove.dyndns.org (2001-04-14)
Re: [Followup] Re: "Near Miss" error handling? rog@vitanuova.com (2001-04-14)
| List of all articles for this month |

From: gwyn@thislove.dyndns.org (Gwyn Judd)
Newsgroups: comp.compilers
Date: 10 Apr 2001 01:16:54 -0400
Organization: World Center for Flux Capacitor Research
References: 01-03-135 01-03-165 01-04-011
Keywords: errors
Posted-Date: 10 Apr 2001 01:16:54 EDT

This is a followup to all the people who have responded to my query.
It's all been tremendously helpful. I came up with a slightly new use
for the whole idea today. I'm using a method of error recovery I
borrowed from the book "Recursive descent Compiling" by A.J.T Davie and
R. Morrison (the book was in Algol but my compiler is in Visual Basic.
I'm not sure if it is an improvement :) Actually I never wrote anything
in Algol so the first thing I had to do was to teach myself to read it).


They suggest a method whereby when you come across an error, you print
out an error and then set a global variable called "recovering". When
recovering is set, no error messages can be printed out. Additionally,
the next time you get to a position where the input symbol has to be a
certain string, you skip input until the input symbol is that string.
This has the effect of initially assuming the thing that caused the
error was simply misspelled and continuing with the parse as such. When
you get to the next "known token" you have either recovered from the
error, or you skip input until the symbols match at which point you
unset recovering. I find this method works decently in the case where
the token was simply misspelled ("clas" instead of "class"). This is by way
of introduction, not the new bit.


If you imagine that we are at a point in the grammar where there are
several alternatives:


instruction: "if" "(" boolean ")" block
                | "while" "(" boolean ")" block
                | "loop" "(" number ")" block


Suppose the input as this point is "i(some_boolean_function())". This is
probably a misspelled "if" statement. The near-miss routine can take the
next token, as well as a list of possibles, and tell us which one it is
likely to be. Additionally, if none of them are "close enough", it
returns none of them so we don't get too confused.


But we can do more. Imagine our grammar is augmented with some syntax
for calling functions:


instruction: "if" "(" boolean ")" block
                | "while" "(" boolean ")" block
                | "loop" "(" number ")" block
                | identifier "(" ")" ";"
                | identifier "." identifier "(" ")" ";"


Where an identifier is defined as being basically anything like a C/C++
identifier. Then, in the example above, the token "i" looks like a
function name. What I do here is look ahead a couple of symbols. If the
one following the "(" is not a ")", then I know it isn't a function call
(this obviously wouldn't work too well for a more C-like language,
fortunately in this language you can't provide arguments to functions).


So if we look at our string "i(some_boolean_function())", realising it
isn't a function call, we call the near-miss function with a list of
possibles (if, while, loop). It tells us that it is probably an "if".
Since the parser at this point in the grammar is a set of chained if
statements (probably should be a select case (switch)), we then jump
backwards from the error handler (using "goto" of course :)) into the
grammar, just after where the "if" token is, after consuming the token
from the input, of course.


That is, the instruction routine would be something like (psuedocode):


if have "if" then
do_the_if:
        do_if
elseif have "while" then
do_the_while:
        do_while
elseif have "loop" then
do_the_loop:
        do_loop
elseif have identifier then
        if token = "." then
                function_call
        elseif token = "(" then
                next = peek_ahead
                if next = ")" then
                        function_call
                else
                        ' puts us into recovery mode
                        syntax "expected instruction but got " token " instead


                        probable = near_miss(token, "if", "while", "loop")


                        ' consume
                        next_token


                        select case probable
                                case "if"
                                        goto do_the_if
                                case "while"
                                        goto do_the_while
                                case "loop"
                                        goto do_the_loop
                        end select
                end if
        end if
end if


If we find that the token matches none of the possibles, the token is
simply consumed, as though the grammar matched at that point, and the
function returns without taking any other action. This can help with
extraneous words in the input.


I find the compiler used to get into situations where a misspelled word
would throw the compiler into a state where recovery was completely
impossible. It might try to parse a class declaration, thinking it was
the definition of a function. The end result was that many, many
spurious errors were ejected as the compiler went in and out of
recovery. By using this method, I find that quite often we can pick with
better accuracy the correct spot to jump back into the parse tree.


This all lacks a decent amount of testing as I wrote it all just today.
I have no idea if any of this makes any sense as it is very late here.
I have a feeling that this method can be very helpful, if used sparingly
and cautiously. I'm not sure if it would be easy to use this sort of
method with a table driven parser.


--
Gwyn Judd
Let your conscience be your guide.
-Pope





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.