Re: Writing a parser for C++ - Ada hybrid

Chris Dodd <cdodd@acm.org>
Sat, 05 Jul 2008 15:56:35 -0500

          From comp.compilers

Related articles
Writing a parser for C++ - Ada hybrid flagaye@little-worlds.com (Florent LAGAYE) (2008-07-03)
Re: Writing a parser for C++ - Ada hybrid cdodd@acm.org (Chris Dodd) (2008-07-05)
| List of all articles for this month |

From: Chris Dodd <cdodd@acm.org>
Newsgroups: comp.compilers
Date: Sat, 05 Jul 2008 15:56:35 -0500
Organization: Compilers Central
References: 08-07-008
Keywords: parse, debug
Posted-Date: 05 Jul 2008 19:10:50 EDT

Lets look closely at your conflict to see what is going on. The conflicts
are all in state 96:


> itat 96
:
> 62 | locator .
:
> 68 | locator BOUND_TOKEN locator .
:
> PLUS_TOKEN riduction par utilisation de la rhgle 62 (expression)
> PLUS_TOKEN [riduction par utilisation de la rhgle 68
(expression)]


I've edited this down severely so we're looking at just one
conflicting token, but all the rest are the same, just with different
following tokens.


What this means is that in a state where the parser has seen "locator
BOUND_TOKEN locator" and the following token needs to have an
expression before it, it can't decide whether to reduce the whole
thing to an expression or just the final locator (which would require
later reducing the expression back to a locator with rule 44 "locator:
expression DOT_TOKEN IDF_TOKEN".


So this is both an ambiguity and a lack of enough lookahead case -- if
there isn't a trailing "DOT_TOKEN IDF_TOKEN", reducing rule 62 will
lead to a syntax error, so it would have to be rule 68. However, if
there is such a trailing context, either might be valid.


The 'easy' solution is if reducing rule 68 is correct even in the
presence of such trailing context (which I think it is). In that
case, you can just put rule 68 before rule 62, and the default
conflict resolution will do the right thing.


If you want to actually get rid of the conflict, there are a number of
approaches. One is simplifying the grammar to get rid of grammar
distinctions and instead resolve them with later semantic checks.
This is what is usually done with stongly typed languages, as doing
typechecking in the grammar is very hard. In this case you could move
the distinction between a 'locator' and an 'expression' out of the
grammar; it could easily be handled with normal typechecking. To do
this, you just replace 'locator' with 'expression' everywhere appears
in the grammar, and get rid of the resulting redundant "expression :
expression" rule.


The second approach is unfactoring -- replacing nonterminals on the
rhs of rules with the rhs of the rules for those nonterminals (which
involves duplicating rules if there are multiple rules for the
nonterminal.) In this case, for example, you can replace the rule
'expression : locator' with 'expression : expression DOT_TOKEN
IDF_TOKEN | namespace IDF_TOKEN' (two rules). To get rid of the
conflicts, you'll need to keep doing this until you get rid of all
uses of 'locator' in the grammar. Since 'locator' is
non-self-recursive, you can actaullay do that, but it will lead to a
significantly more complex grammar. In cases with recursive rules and
conflicts that require unbounded lookahead to resolve, unfactoring
will never reach a solution. Unfactoring works best when there is no
ambiguity and you just need a little bit of extra lookahaead to
resolve the conflict.


Chris Dodd
cdodd@acm.org


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.