15 Jul 2002 23:52:07 -0400

Related articles |
---|

Have I discovered something new? steve.horne@iris.co.uk (Stephen Horne) (2002-07-15) |

Re: Have I discovered something new? torbenm@diku.dk (Torben Ægidius Mogensen) (2002-07-21) |

Re: Have I discovered something new? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21) |

Have I discovered something new? cfc@world.std.com (Chris F Clark) (2002-07-21) |

Re: Have I discovered something new? kgw-news@stiscan.com (2002-07-21) |

Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-24) |

Re: Have I discovered something new? steve@lurking.demon.co.uk (Steve Horne) (2002-07-25) |

[4 later articles] |

From: | "Stephen Horne" <steve.horne@iris.co.uk> |

Newsgroups: | comp.compilers |

Date: | 15 Jul 2002 23:52:07 -0400 |

Organization: | Compilers Central |

Keywords: | parse |

Posted-Date: | 15 Jul 2002 23:52:07 EDT |

I do not consider myself an expert in parsing techniques, but I have

take an interest over a considerable period of time and that interest

has approached obsessive levels recently as I have found some

interesting applications in unexpected places.

Most of my detailed knowledge has come from the books by Dick Grune -

'Modern Compiler Design' and 'Parsing Techniques, A Practical Guide'

(the latter generously made available for free download).

My main question refers to something I read in 'Parsing Techniques, A

Practical Guide'. This book contained a statement that the possibility

of an linear-time parsing algorithm for arbitrary context-free

grammars was still an open question - that there was no proof or

disproof, that every known context-free grammar can be parsed in

linear time using specially designed methods, but that no general

algorithm currently exists which works for all context-free grammars.

Is this still the state of play?

The reason I ask is that I believe I have found an algorithm to do

precisely this. It will certainly handle any grammar that a Tomita

parser will handle, and do so without needing stack duplication. If a

scentence is ambiguous it will give all possible parsings. If a

grammar is ambiguous that can be detected in the precalculated tables.

It is even possible to give a description of the ambiguity using a

list of the parts of the distinct parse trees required to understand

the ambiguity, and to find the context.

The reason I say 99.9% certain is that I haven't yet implemented it,

and the description I've written is not 100% fleshed out yet.

The biggest problem the idea has is that it will need large tables for

any grammar that would have conflicts in existing LR models. Though I

also have a kind of answer for that - if the precise parse tree is not

needed, a simple tweak can adapt the method to generate a semantic

AST-like structure, and this will dramatically reduce the size of the

tables required for typical grammars. The technique may allow grammars

to be written purely for convenience and readability, without the

usual tweaking to avoid conflicts, yet still have table sizes

equivalent to using a tweaked grammar and an LR(1), LALR(1) or even

LR(0) parser depending on the grammar.

The only area where I have a serious uncertainty is in the issue of

parsing infinitely ambiguous grammars. I need to think about this some

more. I'm pretty sure the technique can be adapted, and the final

output from parsing a grammar and scentence with infinite ambiguity

would be essentially a regular grammar (with branches instead of

alternatives) describing the structure of the parse tree.

So really, I want to know whether this is worth persuing (ie has it

already be done). If not, how should I go about releasing the idea

into the wild - is there a suitable journal, for instance.

--

Steve Horne

steve.horne@iris.co.uk

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.