|Have I discovered something new? email@example.com (Stephen Horne) (2002-07-15)|
|Re: Have I discovered something new? firstname.lastname@example.org (Torben Ægidius Mogensen) (2002-07-21)|
|Re: Have I discovered something new? email@example.com (Joachim Durchholz) (2002-07-21)|
|Have I discovered something new? firstname.lastname@example.org (Chris F Clark) (2002-07-21)|
|Re: Have I discovered something new? email@example.com (2002-07-21)|
|Re: Have I discovered something new? firstname.lastname@example.org (Mark) (2002-07-24)|
|Re: Have I discovered something new? email@example.com (Steve Horne) (2002-07-25)|
|[4 later articles]|
|From:||"Stephen Horne" <firstname.lastname@example.org>|
|Date:||15 Jul 2002 23:52:07 -0400|
|Posted-Date:||15 Jul 2002 23:52:07 EDT|
I do not consider myself an expert in parsing techniques, but I have
take an interest over a considerable period of time and that interest
has approached obsessive levels recently as I have found some
interesting applications in unexpected places.
Most of my detailed knowledge has come from the books by Dick Grune -
'Modern Compiler Design' and 'Parsing Techniques, A Practical Guide'
(the latter generously made available for free download).
My main question refers to something I read in 'Parsing Techniques, A
Practical Guide'. This book contained a statement that the possibility
of an linear-time parsing algorithm for arbitrary context-free
grammars was still an open question - that there was no proof or
disproof, that every known context-free grammar can be parsed in
linear time using specially designed methods, but that no general
algorithm currently exists which works for all context-free grammars.
Is this still the state of play?
The reason I ask is that I believe I have found an algorithm to do
precisely this. It will certainly handle any grammar that a Tomita
parser will handle, and do so without needing stack duplication. If a
scentence is ambiguous it will give all possible parsings. If a
grammar is ambiguous that can be detected in the precalculated tables.
It is even possible to give a description of the ambiguity using a
list of the parts of the distinct parse trees required to understand
the ambiguity, and to find the context.
The reason I say 99.9% certain is that I haven't yet implemented it,
and the description I've written is not 100% fleshed out yet.
The biggest problem the idea has is that it will need large tables for
any grammar that would have conflicts in existing LR models. Though I
also have a kind of answer for that - if the precise parse tree is not
needed, a simple tweak can adapt the method to generate a semantic
AST-like structure, and this will dramatically reduce the size of the
tables required for typical grammars. The technique may allow grammars
to be written purely for convenience and readability, without the
usual tweaking to avoid conflicts, yet still have table sizes
equivalent to using a tweaked grammar and an LR(1), LALR(1) or even
LR(0) parser depending on the grammar.
The only area where I have a serious uncertainty is in the issue of
parsing infinitely ambiguous grammars. I need to think about this some
more. I'm pretty sure the technique can be adapted, and the final
output from parsing a grammar and scentence with infinite ambiguity
would be essentially a regular grammar (with branches instead of
alternatives) describing the structure of the parse tree.
So really, I want to know whether this is worth persuing (ie has it
already be done). If not, how should I go about releasing the idea
into the wild - is there a suitable journal, for instance.
Return to the
Search the comp.compilers archives again.