Re: How test if language is LL(k) or LR(k) ?

Stephen Horne <sh006d3592@blueyonder.co.uk>
Fri, 07 Nov 2008 21:35:30 +0000

          From comp.compilers

Related articles
How test if language is LL(k) or LR(k) ? a.moderacja@gmail.com (Borneq) (2008-10-23)
Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-10-28)
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-10-29)
Re: How test if language is LL(k) or LR(k) ? rpboland@gmail.com (Ralph Boland) (2008-10-31)
Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-06)
Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? ulf.schwekendiek@googlemail.com (ulf.schwekendiek@googlemail.com) (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-07)
Re: How test if language is LL(k) or LR(k) ? torbenm@pc-003.diku.dk (2008-11-10)
Re: How test if language is LL(k) or LR(k) ? cfc@shell01.TheWorld.com (Chris F Clark) (2008-11-10)
Re: How test if language is LL(k) or LR(k) ? sh006d3592@blueyonder.co.uk (Stephen Horne) (2008-11-12)
| List of all articles for this month |

From: Stephen Horne <sh006d3592@blueyonder.co.uk>
Newsgroups: comp.compilers
Date: Fri, 07 Nov 2008 21:35:30 +0000
Organization: virginmedia.com
References: 08-10-044 08-10-055 08-11-033
Keywords: parse, theory, comment
Posted-Date: 07 Nov 2008 18:58:40 EST

On Fri, 07 Nov 2008 14:27:18 +0100, torbenm@pc-003.diku.dk (Torben AEgidius Mogensen) wrote:


>Stephen Horne <sh006d3592@blueyonder.co.uk> writes:
>
>> For LR, *almost* all context-free grammars are LR(k < infinity), so if
>> you can prove that the grammar is context-free you are almost there.
>
>That's not very useful, and not even true. There are many useful
>grammars that are not LR(k) for any k and even useful languages for
>which there doesn't exits LR(k) grammars for any k.


Context-free grammars? And for any finite k?


My understanding is that you'd need an infinite input. While there are
certainly applications where unbounded inputs occur, a genuinely
infinite input can never be completely parsed anyway.


I'm not saying I'm right here, as I'm certain you know more than me,
but I'd like to see an example.


>> You can always construct an LR(1) - or even LR(0) - state model to
>> parse a context-free grammar, but if the grammar is not LR(1) that
>> state model will have shift/reduce or reduce/reduce conflicts.
>>
>> Almost all such conflicts can be resolved finitely. This is the logic
>> behind Tomita and other generalised LR parsing algorithms. The basic
>> principle is that conflicts are resolved by checking all possibilities
>> - the Tomita approach being to check them concurrently using stack
>> duplication, whereas another common approach is to use backtracking.
>
>This does, however, not give you any k, since the amount you have to
>read ahead before backtracking may depend on the input.


My claim is that almost all conflicts can be resolved finitely. I
never claimed that this tells me the value of k. A method for
determining k was vaguely described later in the post.


Of course formally I'm wrong, because I'm counting in terms of what is
useful rather than the far more numerous grammars that aren't.


Anyway, the main limitation on finite resolution is that the input
itself must be finite. That's no cheat since any input that can be
parsed completely must be finite. Obviously, formally, its a
limitation - but if you're doing state-so-far parsing of infinite
input streams, large lookaheads (often any lookaheads) are
inappropriate anyway, so it didn't seem relevant.


I probably should have mentioned this, of course.


>> I have designed an variant of GLR (though I have never implemented it)
>> which should theoretically give constant-time parsing of any context
>> free grammar
>
>I assume you mean "linear-time", as constant-time parsing sounds
>implausible.


Yes - sorry - I meant constant time per token of input. For some
variants (depending on how the table is managed and evaluated) that's
amortised constant time per token of input.


> But even linear-time parsing for any context-free
>grammar sounds suspect and would be a very big result (if you can
>prove it).


1. As I said, there's a pretty big proviso in there, related to the
        halting problem. The only improvement in the power of this
        relative to backtracking/Tomita is that it can detect problems for
        specific inputs at run-time.


        Actually, there is a proviso to that proviso. My method doesn't
        need reduce sizes to be encoded in the state model - they are
        determined at run-time by the table. This makes it easy to support
        EBNF definitions for nonterminals. There's no extra power in terms
        of the set-of-strings definition of a grammar, but it's a little
        more flexible in how you get to define the grammar.


        That's not all that interesting, though, as EBNF rules could
        already be handled by grammar transformations.


        Early is certainly still more powerful. Where backtracking LR goes
        into an infinite loop and tabular LR can give a run time error, my
        understanding is that Early would just work.


2. As I said, at best I re-invented it ten years after the fact.
        Formal papers have already been published for tabular LR parsing.


        I also suggest some tricks such as a kind of sliding window for
        the table for handling unbounded inputs, but that leads to further
        provisos. Truth told, even ignoring the cycle issues, you can
        only ensure linear time parsing of any CF grammar using my method
        if you know the full input up-front, so that you can use a
        preallocated table.


        Actually, I'm not entirely confident that Nederhofs tabular LR is
        the same as mine - I haven't read his papers properly, and can't
        even get hold of all of them. But given what I have read, it seems
        to be the same thing.


        That said, I believe that a proof that general CF parsing is
        linear time was described here some years back. I started a thread
        in 2002 relating to linear time parsing (my idea back then was
        broken), and one of the replies referred to Kolomogorov complexity
        theory. 6 years later, and I still haven't done the reading to
        understand that.


        Search for "Have I discovered something new?" and "Stephen Horne"
        in Google Groups.


3. A lot of the costs have big constant factors. The cycle checks for
        a start. *Very* unlikely to be mainstream. Potentially useful for
        niche applications, I suppose, but even then I have my doubts.


My notes are in a 12-page approx. 340K OpenDocument Text file. The
real explanation of the method takes about 3 pages, the rest is on
variations and other general notes.


I can e-mail them if you want.
[Yes, there are lots of context free grammars that are not LR(k). See, for
example http://compilers.iecc.com/comparch/article/99-10-178 when this
came up nine years ago. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.