Re: Is LALR(1) or LL(k) parser better for C++

"Scott Stanchfield" <thetick@scruz.net>
26 Jan 1997 22:29:47 -0500

          From comp.compilers

Related articles
Is LALR(1) or LL(k) parser better for C++ kikonen@cs.joensuu.fi (1997-01-22)
Re: Is LALR(1) or LL(k) parser better for C++ jlilley@empathy.com (John Lilley) (1997-01-22)
Re: Is LALR(1) or LL(k) parser better for C++ dlmoore@ix.netcom.com (David L Moore) (1997-01-25)
Re: Is LALR(1) or LL(k) parser better for C++ thetick@scruz.net (Scott Stanchfield) (1997-01-26)
Re: Is LALR(1) or LL(k) parser better for C++ mrs@kithrup.com (1997-01-26)
Re: Is LALR(1) or LL(k) parser better for C++ dlmoore@ix.netcom.com (David L Moore) (1997-01-29)
| List of all articles for this month |

From: "Scott Stanchfield" <thetick@scruz.net>
Newsgroups: comp.compilers
Date: 26 Jan 1997 22:29:47 -0500
Organization: scruz-net
References: 97-01-163 97-01-181 97-01-206
Keywords: yacc, debug

> Several issues stand out. First, it is much harder to debug LALR
> grammars. You can get your average debugger to track through
> semantic code embedded in the grammar, but you cannot (at least I
> cannot) get it to print out those variables with names like
> $$. Also, single stepping will insist on stepping through the parser
> code as well. Some of this is a feature of YACC rather than LALR
> parsing in general.


Just a note here -- the "$$ variables" in yacc actually get turned into
references like yypvt[0]. Depends on the yacc you're using. In some, the
$1, $2, $3 are represented by (if you just reduced something like a: b c d)
    yypvt[0] -- most recent ($3)
    yypvt[-1] -- next back ($2)
    yypvt[-2] -- next back from that ($1)


and so on. Basically, yypvt is a pointer into the stack where all the
fun happens. But beware -- this can change based on the yacc you're
using...


That aside, I still think table-driven parsers (whether they are LL,
LR or LALR) are just plain hard to debug. Recursive-descent is much
easier and it's really possible to see a correlation between the code
and the grammar.


> Second, control of parsing in those situations where a symbol must
> be (for example), a type-id in one context or an undefined-id in
> another are hard to handle. Either you have weird productions like
> <var_decl> = <type_id> <type_id>; or you have globals flags flying
> back and forth between the lexer and parser, neither of which is a
> great solution.


This is handled well by a parser-generator such as PCCTS, where you
can add semantic predicates to choose alternatives based on a
condition. Rather than set up a test in the lexer and return
different tokens, you return a single type of token in either case and
predicate the grammar to choose rules based on the token AND a
condition. Glad to see someone who is as dead against parser-->lexer
backtalk as I...


> Third, I see ambiguities on productions that I think should not be
> ambiguous in a full LR grammar. I might be wrong about this since I
> have not tried to run the grammar through a full LR parser nor have
> I sat down and tried to grind out the states by hand but my
> suspicion is that the power you lose with the LALR simplification is
> significant.


> We shall probably replace this parser with a recursive descent
> parser at some time in the fairly near future ("real soon now"). If
> we proceed, it will be interesting to compare the ease of producing
> the two parsers.


Have a look at PCCTS -- look at
    http://java.magelang.com/antlr/entry.html


for some great starting info on it. PCCTS (ANTLR/DLG) generates
recursive-descent parsers that are really cool. I also have a tutorial in
progress
    http://www.scruz.net/~thetick/pcctstut


As for C++ with PCCTS, see John Lilley's work (via the entry page above).
--
Scott Stanchfield
Santa Cruz, CA
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.