12 Sep 2002 00:21:54 -0400

Related articles |
---|

[17 earlier articles] |

Re: regular expression operators in CF grammars cfc@world.std.com (Chris F Clark) (2002-07-15) |

Re: regular expression operators in CF grammars kgw-news@stiscan.com (2002-07-21) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars nospam@snafu.de (Sönke Kannapinn) (2002-08-10) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-08-23) |

Re: regular expression operators in CF grammars tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-09-03) |

Re: regular expression operators in CF grammars cfc@shell01.TheWorld.com (Chris F Clark) (2002-09-12) |

Re: regular expression operators in CF grammars cfc@TheWorld.com (Chris F Clark) (2002-09-14) |

RE: regular expression operators in CF grammars qjackson@shaw.ca (Quinn Tyler Jackson) (2002-09-14) |

From: | "Chris F Clark" <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | 12 Sep 2002 00:21:54 -0400 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 02-05-142 02-05-151 02-06-024 02-06-056 02-06-090 02-07-010 02-07-025 02-07-043 02-08-035 02-08-078 02-09-018 |

Keywords: | parse |

Posted-Date: | 12 Sep 2002 00:21:54 EDT |

tj bandrowsky wrote:

*> If so, I was able to parse it with diplodicus. The trace for the*

*> parse is below. Diplodicus tries to solve this sort of ambiguity*

*> problem by never reducing until there is one and only one unambiguous*

*> reduction to take. It keeps track of all the rules that are possibly*

*> ambiguous as each token is shifted. The trick I used to make this*

*> work is sensing the condition where you go from not reducing a*

*> non-ambiguous rule because another rule is ambiguous with to have no*

*> reductions to make at all, in which case you have to have backtrack at*

*> most one token, take the unambiguous reduction you previously blew*

*> off, and then proceed.*

From, what I can tell, you have the correct interpretation, and it is

not surprising that you can find it by "parsing ahead" until there is

no ambiguity and then making the unresolved reductions to match the

now unambiguous context.

I am a little suprised that you can always backup at most one token.

My experience suggests that there are temporary/local ambiguities

(i.e. conflicts) that require an infinite amount of lookahead to

resolve, although the lookahead can always be parsed by a CFG.

Consider:

goal: a | b;

a: x y z;

b: r s t;

x: "0" x | "0";

y: "1" y | "1";

z: "2";

r: "0" r | "0";

s: "1" s | "1";

t: "3";

In this gramar, the non-terminal x is the same as r and y is the same

as s but z and t are distinct. Thus, the language is unambiguous.

However, to distinguish between x and r, one must look past y and s,

requiring unbounded lookahead, but with lookahead that is parse-able

with a CFG. Do you think you can parse this by backing up only one

token? What if the ambiguity in z versus t is k tokens in? My point

is not that you cannot resolve this conflict with diplodicus (I htink

you can), but that you may have to back up more than one token to do

so.

BTW, this is exactly the type of problem that we tried to solve with

our "LR(infinity)" parsing technology.

Note, this gramar is GLR, but not LR(k) for any k. Note, that the

language itself is LR(k), even though the grammar given isn't. To

make an LR(k) grammar, simply substitute x for r and y for s in the b

rule and viola an LR(k) grammar--thus the language is LR(k) despite

the conflicts in the original grammar.

The LR(infinity) model could be stated simply as:

if at any conflict in the LR machine the two conflicting rules are the

result of distinct items, the parser generator attempts to resolve the

conflict by parsing the distinct right context grammars implied by

those two items until they can be distinguished.

The right context grammar for the rule x: "0" x | "0" .;

is:

a: x .y z;

y: "1" y | "1";

z: "2";

The right context grammar for the rule r: "0" r | "0" .;

is:

b: r .s t;

s: "1" s | "1";

t: "3";

Resolving this conflict may introduce new conflicts in the resulting

LR machine and these are resolved the same way. One can think of this

as "taking the limit" of the conflict resolution process. If the

conflicts are eventually all resolved, the limit converges and the

grammar is LR(infinity). If the conflicts continue generating new

conflicts, the limit diverges and the grammar is not LR(infinity).

By appealing to the theorem which states that it is undecidable

whether a given cfg is unambiguous or not, we can tell that there is

no algorithmic way to decide whether the limit will converge or not

(for any arbitrary) grammar. (That is, if someone gives an ominpotent

being an algorithm that purports to solve the ambiguity problem, that

omnipotent being can give them a grammar their algorithm will

misclassify.) However, it is my belief that for a large class of

grammars, the limits do converge. I also believe that most common

ambiguities can also be detected.

(My apologies in being so tardy in replying. I've been on vacation

and then got caught up in another interesting discussion via email on

predicated grammars.)

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

3 Proctor Street voice : (508) 435-5016

Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.