21 Jul 2002 02:09:33 -0400

Related articles |
---|

Have I discovered something new? steve.horne@iris.co.uk (Stephen Horne) (2002-07-15) |

Re: Have I discovered something new? torbenm@diku.dk (Torben Ægidius Mogensen) (2002-07-21) |

Re: Have I discovered something new? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21) |

Have I discovered something new? cfc@world.std.com (Chris F Clark) (2002-07-21) |

Re: Have I discovered something new? kgw-news@stiscan.com (2002-07-21) |

Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-24) |

Re: Have I discovered something new? steve@lurking.demon.co.uk (Steve Horne) (2002-07-25) |

Re: Have I discovered something new? robert.corbett@sun.com (Robert Corbett) (2002-07-31) |

Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-31) |

Re: Have I discovered something new? cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-04) |

[1 later articles] |

From: | "Chris F Clark" <cfc@world.std.com> |

Newsgroups: | comp.compilers |

Date: | 21 Jul 2002 02:09:33 -0400 |

Organization: | Compilers Central |

References: | 02-07-057 |

Keywords: | parse |

Posted-Date: | 21 Jul 2002 02:09:32 EDT |

Steve Horne wrote:

*> This book contained a statement that the possibility*

*> of an linear-time parsing algorithm for arbitrary context-free*

*> grammars was still an open question - that there was no proof or*

*> disproof, that every known context-free grammar can be parsed in*

*> linear time using specially designed methods, but that no general*

*> algorithm currently exists which works for all context-free grammars.*

*>*

*> Is this still the state of play?*

Yes. As far as I can tell there is no published algorithm the parses

every known context free grammar in linear time. Nor is there even

anyone who claims to have implemented an algorithm the does that (that

I am aware of).

*> The reason I ask is that I believe I have found an algorithm to do*

*> precisely this.*

Well, that would be something new and it is certainly worth pursuing

what you have done further, even if what you believe is not

*completely* true.

*> It will certainly handle any grammar that a Tomita parser will*

*> handle, and do so without needing stack duplication.*

At one level I believe your claim for this, because we have an

(unpublished) algorithm implemented in an experimental version of

Yacc++ that does roughly this. (We call our technique LR(infinity)

because it effectively extends the lookahead of the parser out in an

unbounded fashion.) The idea is that instead of duplicating the stack

at runtime as a Tomita parser does, one precomputes the stack

extensions at compile time (the same way that an LR parser does

relative to an LL one).

I believe for non-ambiguous grammar the process of doing so always

terminates. So, if that is what you have invented, I think the basic

concept is sound.

On the other hand, your ability to produce all possible pairings for

ambiguous grammars in linear time is not believable. I have seen a

proof that such pairings grow at an n**2 rate. Thus, writing them

will necessarily take n**2 time.

*> The only area where I have a serious uncertainty is in the issue of*

*> parsing infinitely ambiguous grammars. I need to think about this some*

*> more. I'm pretty sure the technique can be adapted, and the final*

*> output from parsing a grammar and scentence with infinite ambiguity*

*> would be essentially a regular grammar (with branches instead of*

*> alternatives) describing the structure of the parse tree.*

If your technique is similar to the one I described, this is a real

problem. For one thing, there is a proof that there is no algorithm

that can determine if an arbitrary cfg is unambiguous--one can use an

algorithm that determines if an arbitrary cfg is ambiguous to solve

the Post-correspondence problem, which is equivalent to solving the

halting problem. If your algorithm could detect these infinite

ambiguities (and halt), then it would have essentially solved the cfg

ambiguity problem.

Our technique founders on this very rock. Although there are certain

ambiguities it can detect, on other grammars the algorithm loops

forever attempting to resolve the lookahead.

That is not a completely hopeless situation. For every set of

grammars less than some finite length, there is a finite maximum

number of states that any unambiguous grammar has and all ambiguous

grammars are "detectable" by that point. The only problem is that we

cannot know (via an algorithm) what that number is in advance for some

unspecified finite length. That is, if I tell you in advance what the

length is, you can tell me a number, because it exists. However, if

you tell me that you can calculate the number for any length, I can

come up with a length for which your calculation fails.

In any case, we use that property to constrain our algorithm by

allowing the user to fix the maximum nuber of lookahead states.

Unambiguous grammars don't generally need very many lookahead states.

It is worth losing a few unambiguous grammars to have an algorithm

that always terminates.

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

3 Proctor Street voice : (508) 435-5016

Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.