Sun, 26 Sep 1993 11:50:36 GMT

Related articles |
---|

disambiguating transformations MATT@waikato.ac.nz (Matt Melchert) (1993-09-22) |

Disambiguating an arbitrary CFL (was: disambiguating transformations) markh@csd4.csd.uwm.edu (1993-09-25) |

How to... (was: disambiguating transformations) markh@csd4.csd.uwm.edu (1993-09-26) |

Newsgroups: | comp.compilers |

From: | markh@csd4.csd.uwm.edu (Mark) |

Keywords: | parse, theory |

Organization: | Computing Services Division, University of Wisconsin - Milwaukee |

References: | 93-09-099 93-09-120 |

Date: | Sun, 26 Sep 1993 11:50:36 GMT |

Matt Melchert <MATT@waikato.ac.nz> writes:

*>I am looking for an algorithm which will take a (possibly) ambiguous context-*

*>free grammar and perform transformations on it to yield an equivalent*

*>non-ambiguous context-free grammar. Does such a thing exist?*

Yes, sorta. Referring to a conjecture I made a short while back in the

article "Disambiguating an arbitrary CFL":

Every C_n language has an unambiguous C_m grammar for some m >= n.

(context free languages are C_1 languages, as indicated there), I think

that the way you do this is to generalize the Iterated Subset algorithm

for Finite Automata to C_n languages. As indicated before, every C_n

language will have an Index Grammar, which represents an infinite state

diagram of a particular topological structure, basically, the cross

product of n infinite trees. For example, the diagram presented in the

article, representing a C[1, 1] language, is basically the cross product

of 2 1-ary trees (i.e. a 2-dimensional grid).

The subset algorithm can be applied directly to either, but the looming

question is whether or not there is a guarantee that this process will be

finite and will eventually close off. Experience indicates that it will.

The problem is that you have to mess around with the indices when forming

subsets of states and this may lead to a proliferation of new index

dimensions, which is why the dimension, m, of the deterministic C_m

grammar may be larger than the dimension, n, of the C_n language.

In contrast, if you try to go further and find the equivalence sets of

states, you are guaranteed that there will be at least one language out

there which will break your equivalence-finding algorithm, no matter what

algorithm you use. You're working on infinite state diagrams, and

equivalence testing can lead to intractible infinite regresses.

So ... finding the MINIMAL deterministic C_m grammar for a C_n language is

not an algorithmic process. But then, as they say, so what? Any

deterministic C_m grammar will be good enough.

By the way, if this conjecture is true, then since parsing with a

deterministic C_m grammar is linear, that means Context Free Language

Recognition is O(n).

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.