Re: Q: Error detection/recovery in LEX/YACC (Help)

neitzel@ips.cs.tu-bs.de (Martin Neitzel)
Thu, 23 Dec 1993 16:57:12 GMT

          From comp.compilers

Related articles
Q: Error detection/recovery in LEX/YACC (Help) friesend@herald.usask.ca (1993-12-17)
Re: Q: Error detection/recovery in LEX/YACC (Help) collison@osf.org (1993-12-20)
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1993-12-20)
re: Q: Error detection/recovery in LEX/YACC (Help) bdarr@atr-2s.hac.com (1993-12-21)
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1993-12-23)
Re: Q: Error detection/recovery in LEX/YACC (Help) hage@netcom.com (1994-01-18)
Re: Q: Error detection/recovery in LEX/YACC (Help) neitzel@ips.cs.tu-bs.de (1994-01-27)
| List of all articles for this month |

Newsgroups: comp.compilers
From: neitzel@ips.cs.tu-bs.de (Martin Neitzel)
Keywords: lex, yacc, errors
Organization: Inst. f. Informatik, TU Braunschweig, FRG
References: 93-12-081 93-12-091
Date: Thu, 23 Dec 1993 16:57:12 GMT

While the "big picture" given by Byron Darrah is generally OK, I'd like to
correct one point and add further cents:


BD> Whenever yacc encounters a parsing error, it effectively backs
BD> up the stack until it can match the input to an error
BD> production. When it matches such input, a call to yyerror()
BD> is made.


The timing is incorrect. When yacc recognizes an error it calls yyerror()
immediately (and then only if it isn't already in recovery mode (three
token shift rule)). *Then* it starts popping states hoping to uncover an
"error" state. Next, the reduction of a production involving "error"
_may_ require the shift of further expected input as specified, which in
turn means that the parser reads and ignores all other tokens. Your
example, by the way, does exactly that:


BD> Statement : Stmt semicolon_tok
BD> | error semicolon_tok


After the error has been reported and the "Statement: error semicolon_tok"
production has been uncovered, the parser will stick in this state and
ignore all input until you finally feed it a semicolon. Nothing else will
get you back on track. (Byron already pointed this correctly out.)


If you insist on doing it this way, you should clear the error state with
a yyerrok action after the semicolon:


Statement: ... | error ';' {yyerrok;}


More sync tokens than an exclusive ';' are presumably in order, say a '}'
or 'endif'. Before computing lookahead sets youself, it's often
sufficient to let yacc do it by itself. That resolves to a simple


Statement: ... | error


That will rather aggressively try to make sense out of the tokens at the
error point. (Namely with respect to the context a Statement can appear
in.) So your attention shifts to prevent cascading errors. The three
token tolerance rule may appear very dumb, but does its job remarkably
well. If you're not satisfied with this approach, a hand written action
that yylex()es up to a suitable continuation token might help you.


No matter how you do it, the truly big problem is how not to run
semantically amok. For example, you might miss the end of a block and
continue with the wrong scope still active. Sometimes you can exploit
heuristical thumb rules in the lexer to (in)validate an assumption. A
"#define" in a C source is likely to be found at the external level, a
brace at the left margin is more likely pertaining to the function body
than to an inner block, a nested comment start looks spurious.


Martin Neitzel
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.