|splitting grammars in LL / LR / Backtrack / ... email@example.com (Michiel Koens) (1996-04-29)|
|Re: splitting grammars in LL / LR / Backtrack / ... firstname.lastname@example.org (1996-04-30)|
|Re: splitting grammars in LL / LR / Backtrack / ... email@example.com (1996-04-30)|
|Re: splitting grammars in LL / LR / Backtrack / ... firstname.lastname@example.org (Chris Clark USG) (1996-05-01)|
|Re: splitting grammars in LL / LR / Backtrack / ... email@example.com (1996-05-01)|
|Re: splitting grammars in LL / LR / Backtrack / ... firstname.lastname@example.org (1996-05-10)|
|Re: splitting grammars in LL / LR / Backtrack / ... email@example.com (1996-05-19)|
|From:||firstname.lastname@example.org (John Lilley)|
|Date:||19 May 1996 18:32:50 -0400|
On 04/29/96, Michiel Koens wrote:
> I wonder if it is possible to split a grammar into seperately parsable
> parts to 'isolate' the parts that are not parsable with an LL(1) or
> LR(1) parser. This way, a parse-function could parse a grammar mainly
> using a cheap parsing strategy and call more general (=expensive)
> parse-functions only for those parts that need the more general
> techniques. Are there any tools for this?
Following up on Terrance Parr's description of ANTLR's capabilities, I would
like to add that I am currently using it for a C++ parser project, and it
seems up to the task. ANTLR generates LL(k) parsers, which are weaker in a
strict sense than an LALR(k) parser. This is due to the fact that LL(k) must
make parsing decisions based on the next k tokens of input, whereas LALR(k)
can defer some decisions by placing tokens and productions on the stack until
a later disambiguating sequence is recognized.
That being said, ANTLR's semantic and syntactic predicates, IMHO, more than
make up for the difference, since they support arbitrary lookahead and
backtracking. I've compared Jim Roskind's public domain YACC grammar for C++
with the ANTLR grammar I'm working with, and find that mose of the nastiest
ambiguities that stump the YACC grammar are not a problem in ANTLR -- you
just sniff ahead a bit in the token stream (using a predicate) to help
disambiguate the nastier rules.
It should also be noted that for specific cases, you can make parts of an
LL(k) grammar act more like an LR(k) grammar by factoring out common prefixes
and maintaining your own stack disipline. This is an alternative to
backtracking that makes the grammar a bit harder to read, but avoids the
performance hit of backtracking. For example, you can manually transform a
a : b | c
b : d e
c : d f
where d,e,f are arbitrary subrules, into
a : d (b | c)
b : e
c : f
Where the actions for "a" must squirrel away the results of "d" somewhere
(typically on a stack) until either b or c is recognized. This is rather
crude, but sufficient to reduce backtracking in some typical C++ declaration
ambiguities. For example:
declaration: method_declaration | data_declaration | ...
method_declaration : complex_type identifier '(' argument_list ')' ...
data_declaration : complex_type declarator_list ...
(where complex_type can be arbitrarily long) can become:
declaration: declaration1 | ...
declaration1 : complex_type ( method_declaration | data_declaration)
method_declaration : identifier '(' argument_list ')' ...
data_declaration : declarator_list ...
As a final plug, I'm not a parser genius, and I have no problem understanding
LL(k) with predicates and backtracking. But confusion reigns when I attempt
to understand the root of a shift/reduce conflict in a grammar as large as
Return to the
Search the comp.compilers archives again.