4 Aug 2002 11:48:37 -0400

Related articles |
---|

[4 earlier articles] |

Re: Have I discovered something new? kgw-news@stiscan.com (2002-07-21) |

Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-24) |

Re: Have I discovered something new? steve@lurking.demon.co.uk (Steve Horne) (2002-07-25) |

Re: Have I discovered something new? robert.corbett@sun.com (Robert Corbett) (2002-07-31) |

Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-07-31) |

Re: Have I discovered something new? cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-04) |

Re: Have I discovered something new? whopkins@alpha2.csd.uwm.edu (Mark) (2002-08-04) |

From: | "Mark" <whopkins@alpha2.csd.uwm.edu> |

Newsgroups: | comp.compilers |

Date: | 4 Aug 2002 11:48:37 -0400 |

Organization: | University of Wisconsin - Milwaukee, Computing Services Division |

References: | 02-07-057 02-07-079 02-07-126 |

Keywords: | parse |

Posted-Date: | 04 Aug 2002 11:48:37 EDT |

"Robert Corbett" <robert.corbett@sun.com> writes:

*>> I believe for non-ambiguous grammar the process of doing so always*

*>> terminates.*

*>I doubt it. Consider the grammar*

*> S -> aSa | a*

*>This grammar is unambiguous, but it features unbounded nondeterminism.*

This is a good example to plug into the algebraic formalism I described in

previous articles. Through it you can see how the claim above can be

fulfilled.

First extended "canonically" to the bottom-up SSDTG for the SSDT L you'd have

the following (listed in algebraic form):

L >= Sz

S >= aSax + ay

with

input alphabet X = {a}

output alphabet Y = {x,y,z}

and 'standard' interpretation:

x = "reduce [S] <- [aSa]"

y = "reduce [S] <- [a]"

z = "reduce root <- [S]".

Defining SF as the least solution to:

SF >= |0>z + |1>ax SF

and SS by

SS = S SF

you get:

<0| SS = S <0| SF <= Sz

<1| SS = S <1| SF <= Sax SF

so that

L >= SZ >= <0| SS

SS = S SF >= aSax SF + ay SF >= a <1| SS + ay SF

thus yielding the following right-linear system:

L >= <0| A

A >= a <1| A + ay B

B >= |0>z + |1>ax B

whose least solution is:

B = (|1>ax)* |0>z

A = (a<1|)* ay (|1>ax)* |0>z

and

L = <0| (a<1|)* ay (|1>ax)* |0>z.

This is the context-free expression that represents the

SSDT in question:

L = { vw in X* x Y*: v in X*, w in Y*, w encodes a parse tree for v }

The relevant question at hand is to determine what is:

L[v] = { w in Y*: vw is in L }, for any v in X*

which provides a representation of the set of parses for the given

word v.

This, in fact, can be read directly off the expression itself, since:

L = <0| (a<1|)* ay (|1>az)* |0>z

= sum (<0| (a<1|)^m ay (|1>az)^n |0>z: m, n >= 0)

= sum (a^{m+1+n} <0|<1|^m y (|1>x)^n |0>z: m, n >= 0)

= sum (a^{m+1+n} <0| <1|^m |1>^n |0> y x^n z: m, n >= 0)

So that the expression L[v] corresponding to v = a^p is just:

L[a^p] = sum (<0| <1|^m |1>^n |0> y x^n z: m, n >= 0; m+n+1 = p)

The operator expression <0| <1|^m |1>^n |0> is what the "precomputed

stack expression" would correspond to; under the interpretation

<0| = "create new stack"

|0> = "test for empty stack and clear stack"

<i| = "push i onto stack", i = 1, 2, 3, ...

|i> = "pop and test for equality to i", i = 1, 2, 3, ...

In the application at hand, this can be further optimized since

<0| <1|^m |1>^n |0> = 1 if m = n, 0 else

so that

L[a^{2n+1}] = y x^n z; L[a^{2n}] = 0

*>A problem with parsing ambiguous grammars is choosing a suitable*

*>representation for the derivations. Consider the grammar*

*> S -> S | a*

An example even worse than this was actually treated in my previous article.

*>The single string generated by this grammar has an infinite number of*

*>derivations.*

In fact, the derivation words form a CONTEXT-FREE language in their

own right (which, to reiterate the point made previously, is why any

means of representing parse-sets must amount to a theory and notation

for context-free expressions).

If you write the canonical extension to an SSDT as:

L >= Sz; S >= Sx + ay

then this reduces to the context-free (actually, regular) expression

L = ay x* z

so that

L[a] = y x* z.

Your example's not the worst possible since the parse set is ONLY a regular

language, albeit one that produces an infinite of parse words. The worst

example is if you add in the extra items

S -> SS; and S -> 1 (empty word).

Then we're talking business.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.