Wed, 29 Oct 2008 14:17:38 +0000

Related articles |
---|

How test if language is LL(k) or LR(k) ? a.moderacja@gmail.com (Borneq) (2008-10-23) |

Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-10-28) |

Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-10-29) |

Re: How test if language is LL(k) or LR(k) ? rpboland@gmail.com (Ralph Boland) (2008-10-31) |

Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-06) |

Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-07) |

Re: How test if language is LL(k) or LR(k) ? ulf.schwekendiek@googlemail.com (ulf.schwekendiek@googlemail.com) (2008-11-07) |

Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-07) |

Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-10) |

[2 later articles] |

From: | Stephen Horne <sh006d3592@blueyonder.co.uk> |

Newsgroups: | comp.compilers |

Date: | Wed, 29 Oct 2008 14:17:38 +0000 |

Organization: | virginmedia.com |

References: | 08-10-044 |

Keywords: | parse, theory |

Posted-Date: | 29 Oct 2008 19:27:12 EDT |

On Thu, 23 Oct 2008 13:13:12 -0700 (PDT), Borneq

<a.moderacja@gmail.com> wrote:

*>Where k is not specified.*

*>I would not testing if is LL(1),LL(2),LL(3) to infinity but test if*

*>language if LL(k) and would be nice if algorithm return k, find k for*

*>LL(k) LR(K) in finite time.*

For LL, I don't know much.

For LR, *almost* all context-free grammars are LR(k < infinity), so if

you can prove that the grammar is context-free you are almost there.

You can always construct an LR(1) - or even LR(0) - state model to

parse a context-free grammar, but if the grammar is not LR(1) that

state model will have shift/reduce or reduce/reduce conflicts.

Almost all such conflicts can be resolved finitely. This is the logic

behind Tomita and other generalised LR parsing algorithms. The basic

principle is that conflicts are resolved by checking all possibilities

- the Tomita approach being to check them concurrently using stack

duplication, whereas another common approach is to use backtracking.

The basic problem is that sometimes, there are infinite possibilities

to check. This can occur when the grammar contains empty rules - when

there are reductions that pop zero tokens from the stack. However, it

doesn't happen for *all* cases where empty rules are used - except in

the case of LR(0), I think - and it is not reasonable to ban such

rules.

In such cases, Tomita will give an infinite number of stack

duplicates, whereas the backtracking approach will simply give an

infinite loop.

Longer lookaheads can bypass most cases where these loops can happen,

and even LR(1) can handle most practical grammars with empty rules.

However, in general, it is not decidable for a particular grammar

whether such loops can be avoided using longer lookaheads. It is only

decidable if you also know the specific input that will be parsed.

I have designed an variant of GLR (though I have never implemented it)

which should theoretically give constant-time parsing of any context

free grammar that can be parsed, but which (unlike Tomita or

backtracking) can detect these infinite loops at run-time. I say "I

have designed", but it's not my invention - I was beaten to it by

about a decade. I was inspired by the packrat approach to LL parsers,

and designed a memoized/dynamic/tabular LR parsing algorithm, but once

I'd got the basic idea, it was also easy to find certain papers by

Mark-Jan Nederhof et al, published in the mid-90s.

Detecting the infinite loops in this tabular approach is not trivial,

but is constant-time. It involves a digraph cycle check for one column

of the table. This is constant-time because there is a maximum digraph

size, which is decided by the state model and thus the grammar. There

is no rule to say that constant-time must be fast ;-)

Basically, that's it - *almost* any context-free grammar is LR(k <

infinity) but there are a few that are not, and there are some

grammars which cannot be proven either LR(k) or otherwise except by

iterating through the infinite set of possible inputs.

For those grammars that are LR(k), k can be found by starting with an

LR(1) state model and identifying the states at which conflicts exist.

At each of these states, extend the model by adding "lookahead" states

until all the conflicts are resolved. That is, the actions for the

transitions are neither "shift" nor "reduce" but a kind of lookahead

pseudo-shift that adds nothing to the stack. In a sense, this is

building the stack duplication/backtracking into the state model

itself. The maximum number of lookahead steps needed to resolve all

conflicts determines k, and a genuine LR(k) model can then be derived.

This always has the possibility of an infinite loop growing an

infinite state model due to those undecided non-LR(k) cases, however.

My understanding is that this limitation cannot be overcome since

doing so would provide a solution to the halting problem, which is

proven non-decidable.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.