Re: Looking for a parser

"Ira D. Baxter" <idbaxter@semdesigns.com>
23 Sep 2000 14:50:04 -0400

          From comp.compilers

Related articles
Looking for a parser sergey.z@sapiens.com (Sergey Zamansky) (2000-09-08)
Re: Looking for a parser vadim-cbl@siber.com (Vadim Maslov) (2000-09-13)
Re: Looking for a parser adrian@sartre.cs.rhbnc.ac.uk (A Johnstone) (2000-09-15)
Re: Looking for a parser idbaxter@semdesigns.com (Ira D. Baxter) (2000-09-17)
Re: Looking for a parser cfc@world.std.com (Chris F Clark) (2000-09-21)
Re: Looking for a parser idbaxter@semdesigns.com (Ira D. Baxter) (2000-09-23)
| List of all articles for this month |

From: "Ira D. Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: 23 Sep 2000 14:50:04 -0400
Organization: Posted via Supernews, http://www.supernews.com
References: 00-09-026 00-09-105 00-09-128 00-09-151
Keywords: parse

"Chris F Clark" <cfc@world.std.com> wrote in message
> [snip] The reason for the [original] requestor wanting
> backtracking is that the grammar they have generates conflicts when
> run through Visual Parse++, so they would like a tool to eliminate
> those conflicts.
>
> Ira Baxter wrote:
> > The DMS Reengineering Toolkit doesn't do backtracking.
> > Rather, it forks multiple parsers at points of local ambiguities,
> > (GLR parsing), acheiving the same effect more efficiently.
> > See http://www.semdesigns.com/Products/DMS/DMSToolkit.html.
>
> GLR parsing is (at least theoretically) more efficient than
> backtracking and is essentially equivalent to Earley parsing, which
> means it can handle any context free grammar. Those are its strongest
> points.


And it is generally much more efficient to run than an Earley parser,
(GLR is linear time except in possibly-ambiguous regions of the source
text), except for certain anomolous grammars that would give any
parsing technology fits. Our experience on real languages is that
this technology works extremely well, providing a very good balance
between performance and capability.


> Its weakest point is that GLR parsing is based upon LR parsing tables.
> The conflicts in the LR tables are essentially the points where the
> multiple parsers are forked. The conflicts haven't been eliminated
> merely used as guide points for forking. [snip]
>
> In theory, a GLR parser would not need to report the conflicts to the
> user as the resulting GLR parser will resolve (if possible) the
> conflicts at run (parser execution) time by parsing the two distinct
> possibilities in parallel. [snip]


> One of the caveats is the "if possible". If the grammar is ambiguous,
> the GLR parser will create a parse forest (rather than parse tree) for
> ambiguous inputs. A related problem occurs in unconstrained
> backtracking parsers--although for certain ambiguous grammars an
> unconstrained backtracking parser may loop rather generating a forest,
> which is even worse behaviour.


Another problem with backtracking parsers is that they may not
actually pick up ambiguous parses unless they try all paths, which can
be directly exponential. So you have to choose between completeness
in parsing, slow parsing rates, or having to manually remove the
conflicts, which defeats the point of having the backtracking parser.


The fact that a GLR parser doesn't loop like this makes it far
friendlier to engineer with. The fact that it is very efficient in
handling multiple parsers means you don't have to choose.


An LR table generator easily detects conflicts. It cannot tell the
difference between a conflict which is merely a local conflict, or an
actually-ambiguous grammar. We do report the conflicts, and provide
some tools for diagnosing real ambiguities in the grammars. You do
get faster and more usable parsers if you work on removing the
conflicts. [For real ambiguities, if you don't remove the conflicts,
you still get a parser, but then you have to decide which parse tree
from the forest is right during some later semantic pass.]


> There is an alternative that solves the ambiguous grammar problem,
> predicated parsers. [snip]
> A predicated parser resolves the conflicts at compile (parser
> generation) time, by generating a parser that will backtrack over the
> conflicts in the correct order to generate the desired parse.


I don't know a lot about predicated parsers. If the predicates are
resolvable at parser-generation time, then they can only be over
properties of the grammar. For most of the grammars that we see, the
resolution of syntactically ambiguous parses usually requires
non-syntax information (i.e., to decide if the C statement "T*X;" is a
declaration or an expression, you have to know if T is actually
declared as a type). Predicates over the grammar won't help here.


Our GLR parsers can use parse-time semantic predicates, called when a
reduction occurs, to handle this. This fits very well into the GLR
parsing technology. A forked parser, proposing a reduction, calls the
semantic predicate. If the semantic predicate fails, that forked
parser simply disappears. The other parses continue.


This makes it straightforward to parse FORTRAN DO loops, using
semantic predicates to match the statement labels in DO headers with
corresponding statement-labeled CONTINUE statements and getting the
correct loop nesting directly from the parser.


I'd certainly be interested in pointers to predicated parsing. Might
make a useful addition in capability to the GLR parsing technology we
have.
--
Ira Baxter, Ph.D., CTO idbaxter@semdesigns.com 512-250-1018x140
Semantic Designs, Inc., www.semdesigns.com FAX 512-250-1191
12636 Research Blvd #C214, Austin, Texas 78759


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.