Fri, 07 Nov 2008 21:35:30 +0000

Related articles |
---|

How test if language is LL(k) or LR(k) ? a.moderacja@gmail.com (Borneq) (2008-10-23) |

Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-10-28) |

Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-10-29) |

Re: How test if language is LL(k) or LR(k) ? rpboland@gmail.com (Ralph Boland) (2008-10-31) |

Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-06) |

Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-07) |

Re: How test if language is LL(k) or LR(k) ? ulf.schwekendiek@googlemail.com (ulf.schwekendiek@googlemail.com) (2008-11-07) |

Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-07) |

Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-10) |

Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-10) |

Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-12) |

From: | Stephen Horne <sh006d3592@blueyonder.co.uk> |

Newsgroups: | comp.compilers |

Date: | Fri, 07 Nov 2008 21:35:30 +0000 |

Organization: | virginmedia.com |

References: | 08-10-044 08-10-055 08-11-033 |

Keywords: | parse, theory, comment |

Posted-Date: | 07 Nov 2008 18:58:40 EST |

On Fri, 07 Nov 2008 14:27:18 +0100, torbenm@pc-003.diku.dk (Torben AEgidius Mogensen) wrote:

*>Stephen Horne <sh006d3592@blueyonder.co.uk> writes:*

*>*

*>> For LR, *almost* all context-free grammars are LR(k < infinity), so if*

*>> you can prove that the grammar is context-free you are almost there.*

*>*

*>That's not very useful, and not even true. There are many useful*

*>grammars that are not LR(k) for any k and even useful languages for*

*>which there doesn't exits LR(k) grammars for any k.*

Context-free grammars? And for any finite k?

My understanding is that you'd need an infinite input. While there are

certainly applications where unbounded inputs occur, a genuinely

infinite input can never be completely parsed anyway.

I'm not saying I'm right here, as I'm certain you know more than me,

but I'd like to see an example.

*>> You can always construct an LR(1) - or even LR(0) - state model to*

*>> parse a context-free grammar, but if the grammar is not LR(1) that*

*>> state model will have shift/reduce or reduce/reduce conflicts.*

*>>*

*>> Almost all such conflicts can be resolved finitely. This is the logic*

*>> behind Tomita and other generalised LR parsing algorithms. The basic*

*>> principle is that conflicts are resolved by checking all possibilities*

*>> - the Tomita approach being to check them concurrently using stack*

*>> duplication, whereas another common approach is to use backtracking.*

*>*

*>This does, however, not give you any k, since the amount you have to*

*>read ahead before backtracking may depend on the input.*

My claim is that almost all conflicts can be resolved finitely. I

never claimed that this tells me the value of k. A method for

determining k was vaguely described later in the post.

Of course formally I'm wrong, because I'm counting in terms of what is

useful rather than the far more numerous grammars that aren't.

Anyway, the main limitation on finite resolution is that the input

itself must be finite. That's no cheat since any input that can be

parsed completely must be finite. Obviously, formally, its a

limitation - but if you're doing state-so-far parsing of infinite

input streams, large lookaheads (often any lookaheads) are

inappropriate anyway, so it didn't seem relevant.

I probably should have mentioned this, of course.

*>> I have designed an variant of GLR (though I have never implemented it)*

*>> which should theoretically give constant-time parsing of any context*

*>> free grammar*

*>*

*>I assume you mean "linear-time", as constant-time parsing sounds*

*>implausible.*

Yes - sorry - I meant constant time per token of input. For some

variants (depending on how the table is managed and evaluated) that's

amortised constant time per token of input.

*> But even linear-time parsing for any context-free*

*>grammar sounds suspect and would be a very big result (if you can*

*>prove it).*

1. As I said, there's a pretty big proviso in there, related to the

halting problem. The only improvement in the power of this

relative to backtracking/Tomita is that it can detect problems for

specific inputs at run-time.

Actually, there is a proviso to that proviso. My method doesn't

need reduce sizes to be encoded in the state model - they are

determined at run-time by the table. This makes it easy to support

EBNF definitions for nonterminals. There's no extra power in terms

of the set-of-strings definition of a grammar, but it's a little

more flexible in how you get to define the grammar.

That's not all that interesting, though, as EBNF rules could

already be handled by grammar transformations.

Early is certainly still more powerful. Where backtracking LR goes

into an infinite loop and tabular LR can give a run time error, my

understanding is that Early would just work.

2. As I said, at best I re-invented it ten years after the fact.

Formal papers have already been published for tabular LR parsing.

I also suggest some tricks such as a kind of sliding window for

the table for handling unbounded inputs, but that leads to further

provisos. Truth told, even ignoring the cycle issues, you can

only ensure linear time parsing of any CF grammar using my method

if you know the full input up-front, so that you can use a

preallocated table.

Actually, I'm not entirely confident that Nederhofs tabular LR is

the same as mine - I haven't read his papers properly, and can't

even get hold of all of them. But given what I have read, it seems

to be the same thing.

That said, I believe that a proof that general CF parsing is

linear time was described here some years back. I started a thread

in 2002 relating to linear time parsing (my idea back then was

broken), and one of the replies referred to Kolomogorov complexity

theory. 6 years later, and I still haven't done the reading to

understand that.

Search for "Have I discovered something new?" and "Stephen Horne"

in Google Groups.

3. A lot of the costs have big constant factors. The cycle checks for

a start. *Very* unlikely to be mainstream. Potentially useful for

niche applications, I suppose, but even then I have my doubts.

My notes are in a 12-page approx. 340K OpenDocument Text file. The

real explanation of the method takes about 3 pages, the rest is on

variations and other general notes.

I can e-mail them if you want.

[Yes, there are lots of context free grammars that are not LR(k). See, for

example http://compilers.iecc.com/comparch/article/99-10-178 when this

came up nine years ago. -John]

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.