Re: splitting grammars in LL / LR / Backtrack / ...

Chris Clark USG <clark@zk3.dec.com>
1 May 1996 23:16:27 -0400

          From comp.compilers

Related articles
splitting grammars in LL / LR / Backtrack / ... koens@natlab.research.philips.com (Michiel Koens) (1996-04-29)
Re: splitting grammars in LL / LR / Backtrack / ... solution@gate.net (1996-04-30)
Re: splitting grammars in LL / LR / Backtrack / ... k3is@unb.ca (1996-04-30)
Re: splitting grammars in LL / LR / Backtrack / ... clark@zk3.dec.com (Chris Clark USG) (1996-05-01)
Re: splitting grammars in LL / LR / Backtrack / ... dodd@csl.sri.com (1996-05-01)
Re: splitting grammars in LL / LR / Backtrack / ... parrt@lonewolf.parr-research.com (1996-05-10)
Re: splitting grammars in LL / LR / Backtrack / ... jlilley@ix.netcom.com (1996-05-19)
| List of all articles for this month |

From: Chris Clark USG <clark@zk3.dec.com>
Newsgroups: comp.compilers
Date: 1 May 1996 23:16:27 -0400
Organization: Compilers Central
References: 96-04-147
Keywords: parse

Michiel Koens <koens@natlab.research.philips.com> writes:


> I wonder if it is possible to split a grammar into seperately parsable
> parts to 'isolate' the parts that are not parsable with an LL(1) or
> LR(1) parser. This way, a parse-function could parse a grammar mainly
> using a cheap parsing strategy and call more general (=expensive)
> parse-functions only for those parts that need the more general
> techniques. Are there any tools for this?


Yes, it is possible. The Yacc++(r) parser generator "does it" for LL
and LR grammars. It would be possible to do it for grammar requiring
backtracking also.


However, the results are not at all what you would think. The reason
has more to do with the output format, than the resolving power of the
generator.


For example, many LL parser generators emit recursive descent parsers
(i.e. functions which match each grammar fragment). In contrast, LR
parser generators generate table driven parsers rather than code. The
parsing function generated by an LR parser generator is simply an
engine which executes the table. The cost of interpreting the table,
rather than running the recursive descent code directly is what can
make LR parsing (slightly) more expensive. If an LR generator also
generated code, rather than a table and an interpreter, I predict any
performance difference would vanish. [The performance difference is
not significant enough in most people's eyes to justify switching--the
readability of seeing the parsing code written as a set of functions
rather than encoded in tables is usually more of an issue for LL
supporters and the ease of grammar writing an issue for LR
supporters.]


Now, one of the reasons that LR generators produce tables and an
interpreter rather than direct code is related to the resolving power
of the generator. Although not normally viewed as such, the LR
technique is essentially running n LL parsers in parallel up to the
point where they reduce, at which point the "appropriate" parse is
selected. Instead, the normal concept of how LR parsing is
implemented is that an LR parser procedes bottom-up. This concept
along with a few implementation details hides the recursive descent
way of implementing LR parsers in code. [This is part of my earlier
comment that Yacc++ does it. Yacc++ actually puts information in its
parsing tables which marks the recursive descent transformation of the
LR grammar. However, Yacc++ does not generate direct code.]


The reason Yacc++ does not generate direct code is that tables and an
interpreter lend themselves to being used as call-backs. If a parser
is implemented in direct code, then unless the implementation language
supports coroutines, the parser is in control of the flow of control.
The lexer and parsers generated by Yacc++ are based around the
inverted flow of control model prevalent in most event driven
windowing systems. In fact, the "yy_parse" function is merely a loop
which causes the lowest level object (the input object) to feed
characters to the lexer object which groups them into tokens and feeds
them to the parser object. If Yacc++ output direct code and the
non-inverted flow of control, we would have to develop a second API
for all of the objects which supported that model. [Not something to
do lightly.]


A similar point can be made for backtracking. To implement
backtracking, you have to be able to undo certain parsing actions.
Once you have an execution model which allows parsing actions to be
undone, there is little point (beyond any potential performance
divots) in using a simpler model for parts of your grammar and only
using the backtracking execution model at specific points.


So, as you can see, while it is do-able, as far as I know it has not
been done. [I do remember that one Jovial compiler written by SofTech
used an LL and an LR parser in combination. The LL parser processed
the declarations and the LR parser processed the statements. The
grammar was partitioned by hand though and not by a tool. However, I
don't think the reason the grammar was split was because of
performance, but rather that they had two grammars for the relavent
parts (in different notations) lying around and they reused them
rather than rewriting a grammar which combined the two.]


-Chris Clark


Disclaimer: Having worked on Yacc++, my opinions are biased.
(r) -- Yacc++ is a registered trademark of:


Compiler Resources, Inc.
85 Main St, Ste 310
Hopkinton, MA 01748
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.