|Re: Compiler Books? Parsers? cfc@shell01.TheWorld.com (Chris F Clark) (2003-12-03)|
|RE: Compiler Books? Parsers? firstname.lastname@example.org (Quinn Tyler Jackson) (2003-12-03)|
|Re: Compiler Books? Parsers? email@example.com (Rodney M. Bates) (2003-12-08)|
|Re: Compiler Books? Parsers? firstname.lastname@example.org (Nick Roberts) (2003-12-08)|
|From:||Quinn Tyler Jackson <email@example.com>|
|Date:||3 Dec 2003 22:32:29 -0500|
|Posted-Date:||03 Dec 2003 22:32:29 EST|
Mohd Hanafiah Abdullah asked:
> > I have never used parser generators such as yacc/bison or antlr.
> > Would be nice to know people's opinion on manual implementation vs
> > automation of parsers, especially from those who have done both; in
> > terms of efficiency, readability, maintainability,
> > ease-of-development, and so forth.
Jeff Kenton replied:
> > 1. Efficiency: someone I work with, who's often right, claims that you
> > can always write a parser by hand that's faster than what yacc and lex
> > produce. Other people claim exactly the opposite.
And then Chris Clark commented:
> This is trivially true, if one considers the fact that one can take the
> output of any given tool and then profile and modify it so that the
> resulting hand-written tool is faster.
> However, in contrast, if someone figures out how to write a faster
> output for a specific tool, you may have to rearchitect your
> hand-tweaked to take advantage of those improvements. Note, that
> that doesn't happen very often (in 15 years of working on a tool,
> there were approximately 4 times where we specifically worked on
> speeding the generated parsers up).
It has been my experience (as a parsing tool architect) that the tool route
is, overall, the better of the two routes for several reasons. (For any
value of "better"....)
1. Optimizations made to the parsing engine are reflected across all (or
most, anyway) generated parsers that use that engine. Time and time again
when I have improved the Meta-S engine, other parsers have benefited by
those same improvements, without any consideration for those other parsers
having been taken during optimization.
For instance, tweaking the engine to get 10% more speed out of a parser for
language A resulted in parser B gaining 50% more speed. What was a small
bottleneck for A turned out to be a big bottleneck for B. Had parser A and B
been hand-written, any optimizations made to A would have had to have been
applied to B specifically.
2. Generated parsers are generated from specifications that can be checked
in some automated way for "correctness."
For instance, left-recursion is always wrong in an LL(k) (for any k) parser.
Finding left-recursion in a grammar specification isn't particularly
difficult to do by hand, but people get tired eyes, miss a step here and
again, and might miss it in a large specification if the moon is in the
right phase. Finding left-recursion in a grammar specification using
automated means is guaranteed to never suffer from boredom-induced oversight
on the part of the parser generator. It's not enough to hand-write a parser
from a specification that is known to be left-recursion free, since the
"hand writing" part of the process can introduce human error. The only way
to *always* catch left-recursion errors in an LL(k) parser is to generate
that parser from a specification known to be left-recursion free.
Hand-written parsers cannot be checked in an automatic way for this error.
(Proof: Any algorithm that could find left-recursion in a hand-written
parser would imply a solution to the HP.)
And that's *just* left-recursion. Any number of other detectable errors can
be overlooked in a hand-written parser. Automated parser generators never
tire of running all grammar specifications through the standard series of
error checks. As long as the algorithms used to find the errors are correct,
the grammars that pass through these tests, and the parsers they
subsequently generate, are guaranteed to be correct. Hand-written parsers
are only as correct as they happen to be -- and the amount of brainpower to
determine just what that is ... well, anyway ... I think I've made point 2.
3. Parsers generated from a specification can be modified and maintained in
ways that hand-written ones typically aren't. (Chris said enough on this
4. Parsers generated from a specification can be written by domain
specialists. Hand-written parsers can be written by domain specialists who
also happen to be programmers in the host language. Anyone who knows (or
can read) the HTML specification can implement a correct HTML grammar using
a decent tool, and can get all of the efficiency of whatever host language
the tool was implemented in. It takes an HTML expert who also happens to be
a good C++ programmer to hand-code an HTML parser as a C++ class (or set of
classes) from scratch. Domain specialists are not always experienced
programmers, and even if they are, may or may not be inclined to spend time
learning the theory necessary to write efficient parsers for their domain
5. Parser generators can (more and more these days) produce parsers that can
be hosted by a multitude of language environments. One specification can be
used by a Delphi, C++, VB, C#, or Java host if the engine happens to support
interfaces to each of these. The grammar writer need not know how to program
in ANY of these environments for this to come about.
6. Parser generators encourage standardization and community. There are
well-known ways to write grammar subsections that parse function parameter
lists, white space, and dangling-else's. Most of these well known ways are
expressed in the syntax of parser generator specifications. Delphi, C++, VB,
C#, Java, Perl programmers can benefit from the knowledge of others who have
already solved the same parsing issues at a level of abstraction above the
implementation language. (Without parser generators, many of the questions
found in this newsgroup could not be concisely expressed, and answers
provided by experts would adhere to no standard.)
Quinn Tyler Jackson
Return to the
Search the comp.compilers archives again.