|Kannapinn in a Nutshell email@example.com (Joachim Durchholz) (2003-04-20)|
|Re: Kannapinn in a Nutshell firstname.lastname@example.org (Carl Cerecke) (2003-04-27)|
|Re: Kannapinn in a Nutshell email@example.com (Joachim Durchholz) (2003-04-27)|
|Re: Kannapinn in a Nutshell firstname.lastname@example.org (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-06)|
|Re: Kannapinn in a Nutshell email@example.com (Chris F Clark) (2003-05-06)|
|Re: Kannapinn in a Nutshell firstname.lastname@example.org (Carl Cerecke) (2003-05-13)|
|Re: Kannapinn in a Nutshell email@example.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-05-14)|
|[2 later articles]|
|From:||Joachim Durchholz <firstname.lastname@example.org>|
|Date:||20 Apr 2003 17:55:28 -0400|
|Posted-Date:||20 Apr 2003 17:55:28 EDT|
Since Sönke Kannapinn's parsing algorithm has caused some echo, here
are the results of his doctoral thesis, in a nutshell.
I found Kannapinn's thesis extremely well-written and easily
understandable. I have never done any serious parsing theory beyond
the Dragon book (and the Johnstone/??? paper on Tomita-style parsing),
and I found even the theoretic parts of the thesis readable. It was
even easy to trace which pieces of theory were relevant for which
piece of practical advice, something that's often woefully absent in
theoretic papers (if they dwell on practice at all).
Here are the results of the thesis, each in its own section:
1. Compact LR (CLR) parsers.
A standard LR parser uses "items" of the form (A -> a~ . b~, z), where
a~ and b~ are strings of grammar symbols, A -> a~ b~ is a grammar rule
and z is a lookahead.
CLR parser items drop the A and a~ and consist just of (b~, z).
2. CLR parsers can be minimized.
A minimized CLR(k) parser has the minimum number of conflicts for any
IOW if you are in conflict trouble and the conflict persists for CLR, a
stronger LR parser won't help: you'll have to rewrite the grammar, or
use a stronger parsing technology (is there any between LR and GLR/Tomita?).
3. CLR parsers are practical.
They can be implemented with reasonable effort.
Shifts work just as with normal LR parsers.
On a reduction, the parser cannot simply read off the grammar rule from
the state that it is in, so it will have to scan the stack downwards and
see which grammar rule fits. (Since carrying out the actual reduction
involved popping the same stack symbols that are being scanned, the
overhead compared to classical LR parsing should be negligible.)
4. CLR variants of SLR parsers are impractical.
The SLR algorithm can be adapted to CLR parsing, but the result isn't
"simple", so it's not worth the effort pursuing that combination.
5. LALR is applicable to CLR.
The LALR variation can be applied to CLR, and this *is* worth pursuing.
The item merge process will work unmodified, it will merge more states
since there's less information where items can differ. The usual
advantages and disadvantages of LALR carry over: reduced number of
states, but maybe a few extra reductions before the parser detects an error.
6. LR parsing for extended CFGs (ECFG)
An ECFG allows regular expressions on the right-hand part of a rule.
Actually this was asked a while ago on comp.compilers, and consensus was
"feasible but not worth the effort".
Kannapinn's answer is different: If the REs are "strongly unambiguous",
it is always possible to infer the parts of the RE that matched a
consistuent of a construct, which is necessary to find the semantic
actions associated with the parts of a RE.
It is the task of the grammar writer to make the REs strongly
unambiguous, but this isn't difficult: you don't usually write down
ambiguous REs. Remaining ambiguities tend to be of the form (a~b~c~...)*
where all of a~, b~, c~... can generate the empty string; the strongly
unambigous form would then be (a~|b~|c~|...)*.
6a. Most existing literature on ECFG parsing is worthless.
The thesis states that the underlying parsing theory either consists of
handwaving, or it has serious errors; this is exemplified with a handful
of typical publications in the area. Ironically, a detail in this part
of the thesis seems to be in error... the difference being that the
error is inconsequential for the statements made in the thesis ;-)
Currently looking for a new job.
Return to the
Search the comp.compilers archives again.