Re: grammar ambiguity

Chris F Clark <world!cfc@uunet.uu.net>
12 Feb 2000 21:30:59 -0500

          From comp.compilers

Related articles
Grammar ambiguity joachim.durchholz@halstenbach.com.or.de (Joachim Durchholz) (2000-02-05)
New tool to write JIT compilers bonzini@gnu.org (2000-02-05)
Re: Grammar ambiguity idbaxter@semdesigns.com (Ira D. Baxter) (2000-02-10)
Re: Grammar ambiguity rsherry@home.com (Robert Sherry) (2000-02-10)
Re: Grammar ambiguity j.coulmance@itecor.com (Jocelyn Coulmance) (2000-02-12)
Re: grammar ambiguity world!cfc@uunet.uu.net (Chris F Clark) (2000-02-12)
Re: Grammar ambiguity joachim.durchholz@halstenbach.com.or.de (Joachim Durchholz) (2000-02-12)
Re: grammar ambiguity compres@world.std.com (Chris F Clark) (2000-02-12)
Re: Grammar ambiguity cbrtjr@ix.netcom.com (Charles E. Bortle, Jr.) (2000-02-13)
Re: Grammar ambiguity j.coulmance@itecor.com (Jocelyn Coulmance) (2000-02-19)
Re: grammar ambiguity wclodius@aol.com (2000-02-21)
Re: grammar ambiguity world!cfc@uunet.uu.net (Chris F Clark) (2000-02-27)
[6 later articles]
| List of all articles for this month |

From: Chris F Clark <world!cfc@uunet.uu.net>
Newsgroups: comp.compilers
Date: 12 Feb 2000 21:30:59 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 00-02-024
Keywords: parse

    "Joachim Durchholz" <joachim.durchholz@halstenbach.com.or.de> wrote:
> I need a tool that checks context-free grammars for ambiguity. I
> already know that this is not decidable in general, so I'll have to be
> content with heuristics. For example, I could use an LALR parser
> generator; if it reports no conflicts, I know that the grammar is
> unambiguous (besides being LALR).


The are several classes of grammars more general than LALR that are
still unambiguous and some tools that implement their recognition. I
will only mention those I am familiar with.


First, there is LR(k). That resolves some of the artificial conflicts
generated by state merging in the LALR algorithm. Increasing the
value of k, causes the grammar to admit more and more non-ambiguous
grammars. (This does not allow more languages to be expressed, but it
does make expressing the languages simpler.) If I recall correctly,
MKS Yacc supports LALR(2) and the Cocktail toolkit implements LR(k)
(or perhaps LALR(k)) for arbitrary k.


A related class of grammars are the LRR (LR-regular) grammars. These
are languages where instead of one token of lookahead, the grammar
uses a "separating" regular expression to disambiguate the token
stream. This theoretically allows unlimited lookahead (e.g. a*b is
different than a*c), but still accepts only deterministic languages.
The "infi-look" feature of Visual Parse++ claims to support LR-regular
grammars.


Similar to the LR-regular grammars are the LL and LR predicated
grammars. In the predicated grammars, one applies a predicate
expression to disambiguate the conflicting rules. The predicate can
include recursive rules, which allows LL predicates to be predicated
LL grammars (and LR for LR). PCCTS and ANTLR are predicated LL parser
generators. Yacc++ (as of version 2.2 if I recall) supported
predicated LR. Unfortunately, one must specify the predicate by hand
to disambiguate the grammar. Moreover, predicates are generally
implemented by backtracking which can produce non-linear run-times if
they are not carefully written. The tools that I am aware of also do
not check to make certain that the predicates are not ambiguous.
However, the implementation mechanism guarantees that an unambiguous
parse will result (and allows the user to determine the parse in
advance)--the predicates strictly order the parse.


As you noted the Earley and Tomita/Lang (also know as GLR) techniques
parse all unambiguous grammars, but don't detect ambiguities at
generation time.


A different approach to extending the reach of parsers is the use of
non-terminal lookahead. I have 3 papers by someone who researched
that, and called the technique non-canonical LR if I recall correctly.
The k in this technique is k non-terminals and not just k tokens.
This technique can guarantee that the grammar is unambiguous and can
parse grammars that are not LR(k) for any k. However, I'm not aware
of the author ever releasing his software. We also used that
technique in the first version of Yacc++, which we never released.
Foolishly, we replaced the technique with the minimal state LR
algorithm in the versions that saw the light of day.


The final technique worth mentioning is to extend the non-terminal
lookahead by computing its closure (or limit). Essentially this takes
the entire right context into consideration. Moreover, I believe one
can prove that for any unambiguous grammar the process terminates (the
limit converges or more precisely the closure has finite number of
states). However, as far as I can tell that the only check for an
ambiguous grammar is that the process does not terminate. That is
also not quite true, there are some checks that can be applied that
prove that the process will not terminate. However, they are not
complete. The reason I know this is Yacc++ 2.4 implements this
closure operation (in the generator) in an undocumented option. On
some grammars the tool just loops producing gigabytes of states before
running out of memory and failing. In addition, the run-time support
for executing the resulting parsers is not implemented, so the feature
is essentially useless at the moment.




Hope this helps,
-Chris


****************************************************************************
*
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
----------------------------------------------------------------------------
--
Web Site in Progress: Web Site : http://world.std.com/~compres


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.