5 Jul 1996 12:00:42 -0400

Related articles |
---|

LL vs LR languages (not grammars) platon!adrian@uunet.uu.net (1996-06-24) |

Re: LL vs LR languages (not grammars) leichter@smarts.com (Jerry Leichter) (1996-06-26) |

Re: LL vs LR languages (not grammars) ikastan@alumnae.caltech.edu (1996-06-30) |

Re: LL vs LR languages (not grammars) fjh@mundook.cs.mu.OZ.AU (1996-06-30) |

Re: LL vs LR languages (not grammars) schoebel@minnie.informatik.uni-stuttgart.de (1996-07-01) |

Re: LL vs LR languages (not grammars) ikastan@alumnae.caltech.edu (1996-07-01) |

Re: LL vs LR languages (not grammars) ikastan@alumnae.caltech.edu (1996-07-02) |

Re: LL vs LR languages (not grammars) schoebel@minnie.informatik.uni-stuttgart.de (1996-07-05) |

From: | schoebel@minnie.informatik.uni-stuttgart.de (Thomas Schoebel-Theuer) |

Newsgroups: | comp.compilers,comp.compilers.tools.pccts |

Date: | 5 Jul 1996 12:00:42 -0400 |

Organization: | Department of Computer Science, University of Stuttgart, Germany |

References: | 96-06-103 96-06-109 96-07-012 96-07-027 |

Keywords: | parse |

|> Thomas Schoebel-Theuer <schoebel@minnie.informatik.uni-stuttgart.de> wrote:

|> >LL(oo) means to always detect the right alternative corresponding to a

|> >*uniquely determined* subtree of *the* syntax tree, if there is exactly one.

|> >In other words, if the grammar is unambiguous (or more precisely if the

|> >input word has exactly one parse), you will always choose the right

|> >alternatives corresponding to this one tree *in advance*, because you have

|> >infinite lookahead to detect this. Conversely, if there is more than one

|> >tree, you will always get a lookahead conflict.

|>

|>

|> With this definition, LL(oo) seems to be indulging in what one might

|> call 'focused nondeterminism'. It is possible to simply posit the defi-

|> nition and work with it... but since it is somewhat conclusory, an attempt

|> at PDA implementation is: allow nondeterminism, say using the backtracking

|> model, but make it exhaustive. That is, continue exploring choices even

|> after a successful computation, instead of announcing it. If it proves to

|> be the only one, then and only then accept.

|>

|> The case of the perfectionist backtracker.

|>

|> It is clearer under the replicant model, the PDA spawning clones at

|> every instance of a choice. Each clone reaches a conclusion (assuming

|> Greibach normal form, maybe) and the presence of two or more "accept"s

|> blows a fuse.

|>

|> Ignoring the details of LL or LR, does this properly describe

|> infinite lookahead?

Yes, very well.

You got the point.

You do not need exhaustive backtracking, you can use dynamic programming

additionally. By using the compression theorem, you can make the

configuration tree of the backtracking parser into a graph, which

corresponds to Earleys data structure nearly 1:1. Then you get O(n^2) time

and O(n) space for any unambiguous grammar. To test the unambiguity, you

just have to add a garbage collector to eliminate dead ends (which is

basically described in the Graham/Harrison/Ruzzo 1980 paper). You can even

get O(n) if the grammar is LR(k) for some k (which you don't need to know in

advance!) and if you use a right recursion elimination technique described

in my forthcoming thesis. The basic idea is in an article by Leo, but Leos

algorithm doesn't work in O(n) with certain LR(k) grammars having _hidden_

right recursion.

|> >In other words, since LL(oo) comprises the class of unambiguous grammars,

|> >it is strongly more powerful than deterministic methods.

|>

|> In a prior posting I suggested a simpler kind of lookahead -- and

|> weaker than the one above. Rephrasing it: the PDA enters a special state

|> and places a marker under the reading head (or: unconsumed input carries

|> a special starting symbol %). From then on it can read input without

|> consuming it, and can make transitions into subspecial states, but cannot

|> use the stack. There is a special state for returning to normal mode,

|> upon reading the marker (or the %) and with more than one transition (to

|> "report" the outcome of this foray). In short, 2-way FA operation. We

|> could call this LLL, lookie-loo lookahead.

|>

|> It can carry out some lookahead tasks beyond fixed k. I don't know

|> whether it corresponds to something known.

It is *regular* lookahead, as opposed to LL(oo) which uses full CF lookahead.

Suppose your stack has contents (A alpha $) where $ is bottom-of-stack. Say

you have to try two alternatives A -> beta and A -> gamma. You would have

to test whether the intersection of all prefixes of L(alpha beta $) with all

prefixes of L(alpha gamma $) is *regular*. This is undecidable in general.

You have to solve this problem in order to determine whether some word

ending with $ is in the intersection. If $ is in, the intersection of the

two languages is nonempty, and there is a lookahead conflict. To determine

with a FA that $ is not in, you basically have to precompute the

intersection language into a FA.

If you can decide the intersection problem in some cases (perhaps with a

probabilistic algorithm), you can build your FA to decide which alternative

is the right one.

So your problem is that membership in LLL is not statically decidable, as is

also the problem with LL(oo). You probably can look for useful subclasses

which are decidable, in order to allow precomputation of your FA.

Note that the full CF lookahead necessary for LL(oo) can be achieved nearly

without overhead, since the parsing algorithm for testing the lookahead at

runtime can be the same as is used for parsing anyway. You just have to

delay the final *parsing decision* as long as some ambiguities are not yet

removed, in worst case until the EOF is seen.

|> >An example: First look at the language L = { a^n b^n c^m } U { a^n b^m c^m },

|> [ . . . ]

|> >Now we modify the language by appending a single letter in both

|> >cases: L' = { a^n b^n c^m d } U { a^n b^m c^m e }.

|> LLL can do it, too, obviously enough. It only needs to look -- go

|> out and see the 'd' or the 'e'.

|> But LLL is less than LL(oo): one can change the example and foil

|> lookie-loos. Replace 'd' by f^i g^i, and 'e' by f^i g^(i+1), say.

|> Now it takes work to 'read' the criterion.

Good example!

So I'd suggest to use a backtracking parser with dynamic programming and

right recursion elimination, so you will get O(n) with most practical

grammars (close to LR(k)) and O(n^2) in rare worst cases, and you have the

full power of LL(oo). Trying to precompute everything to a deterministic

algorithm leads to statically undecidable classes very soon, so you would

have to restrict yourself to subclasses of LL(oo). As my opinion,

precomputing (for example the macro technique described in my paper) has a

very good use for improving algorithms by constant factors, but cannot solve

the general problem of statically decidable grammar classes, although it can

be used for characterizations of such classes (for example if a grammar can

be macroized by a particular packing strategy such that the resulting tree

generator is deterministic).

-- Thomas

P.S.: The reprint of my article is also available under

ftp://ftp.informatik.uni-stuttgart.de/pub/library/ncstrl.ustuttgart_fi/TR-1996-06/

Leo's article is

@article{Leo91,

author = {Joop M. I. M. Leo},

title = {A general context-free parsing algorithm running in linear time

on every LR(k) grammar without using lookahead},

journal = {Theoretical Computer Science},

volume = {82},

pages = {165--176},

year = {1991}

*}*

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.