Fri, 7 Nov 2014 19:43:44 +0000 (UTC)

Related articles |
---|

Can this be proven for LL(1) grammars? lhp+news@toft-hp.dk (Lasse =?iso-8859-1?q?Hiller=F8e?= Petersen) (2014-11-06) |

Re: Can this be proven for LL(1) grammars? kaz@kylheku.com (Kaz Kylheku) (2014-11-07) |

Re: Can this be proven for LL(1) grammars? slkpg4@gmail.com (SLK Mail) (2014-11-08) |

From: | Kaz Kylheku <kaz@kylheku.com> |

Newsgroups: | comp.compilers |

Date: | Fri, 7 Nov 2014 19:43:44 +0000 (UTC) |

Organization: | Aioe.org NNTP Server |

References: | 14-11-006 |

Keywords: | theory, parse |

Posted-Date: | 07 Nov 2014 15:55:25 EST |

On 2014-11-06, Lasse HillerC8e Petersen <lhp+news@toft-hp.dk> wrote:

*> Given a context free grammar G that is LL(1), is the answer to the*

*> following questions "yes"? I intuitively believe it is, except for (2),*

*> but I am not sufficiently confident of my skills in proving this in full*

*> generality, so I would much appreciate expert advice.*

*>*

*> 1. From G, for each nonterminal symbol R define a grammar GR, where the*

*> start rule is changed to R. Rules that are no longer reachable are of*

*> course useless and discarded. Is each GR also LL(1)?*

This is true, and the informal proof is to work backwards: you cannot

take some grammar GR which is not LL(1) and embed it in a larger

context free grammar G which fixes the problem.

*> 2. From all the GR grammars from (1), define G' with a new start rule S:*

*> GR1 | GR2 ... |GRn. Is G' also LL(1)? Given that there could be rules Gi*

*> and Gj such that their FIRST sets intersect, I suppose the answer is no.*

Indeed:

G1 -> blah blah blah 'x'

G2 -> blah blah blah 'y'

Oops! Of course, the language is LL(1) parseable, but requires

a refactored grammar in which we merge the similar left parts:

G1_G2 -> blah blah blah G1_G2'

G1_G2' -> 'x' | 'y'

If we are building a syntax tree as a semantic action, we can handle this

language by encoding the difference between the 'x' and 'y' variant in the

tree. We don't have G1 and G2 phrase structures, but a G1_G2 that

generates multiple forms.

*> What if instead, S is defined only from a selection of rules with*

*> disjoint FIRST sets?*

This might not help because of the threat of FIRST/FOLLOW conflicts.

The problem is that G1 and G2 might share some constituent, call it C.

The G1 and G2 rules contribute different material to FOLLOW(C):

G1 -> 'x' C Z

G2 -> 'y' C W

G1 and G2 nicely derive a starting nonterminal, so their FIRST sets are clearly

disjoint (in compliance with your stated constraint on S). But the combintion

of G1 and G2 causes a merging of FOLLOW(C) from G1, and FOLLOW(C) from G2.

Moreover, if C can derive empty (C -> N5), FIRST(C) is also being affected. In

general, conflicts can arise from this. Even though there is no

FIRST(C)/FOLLOW(C) conflict in the G1 and G2 rules in isolation, possibly the

FIRST(C) from G1 could conflict with FOLLOW(C) in G2 or vice versa,

under the merge of these sets.

*> 3. From G, define a new grammar G', such that for each nonterminal R, a*

*> rule R: "R" is added, where "R" is a new terminal symbol. Is G' also LL*

*> (1)?*

This lets you replace any phrase structure in the language with

a "super token" which represents that phrase structure like

3 + MULTIPLICATIVE_EXPR.

This seems to fit in with sentential form representations that you

would use in generation or derivation of sentences.

This introduction cannot introduce a new FIRST/FOLLOW conflict.

R -> "R" means that the terminal "R" is in FIRST(R). Can it also

appaer in FOLLOW(R)? Only if grammar derives R R somewhere, and R happens to

derive N5. But in that case, there is already an existing FIRST/FOLLOW

conflict, unless perhaps the *only* rule for R was previously R -> N5.

If we have:

S -> R R

R -> N5

Then we add:

R -> 'R'

We have a FIRST/FOLLOW conflict. Given the input 'R' it is ambiguous

whether it is S -> N5 'R' or S -> 'R' N5.

What's not clear to me is that the original before the new rule doesn't already

have a conflict. I think not because the symbol N5 only appears in FIRST sets,

not FOLLOW sets. In the original unaugmented grammar, we have FIRST(R) = { N5 }

and FOLLOW(R) = { }. The grammar clearly matches the input in exactly

one way, by deriving R -> N5 twice.

Suppose that R actually derives something in addition to N5, and is embedded

in a correct grammar with no conflicts. Since there is no FIRST/FOLLOW

conflict that means that either R does not derive N5, or FIRST(R)

and FOLLOW(R) do not intersect. If they do not intersect, that means that

FOLLOW(R) was not derived from R: there is nothing in the grammar of

the form R R or R X R where symbols X potentially disappear by deriving N5.

Then if we add R -> 'R', it is harmless.

Informally, since R already derives *something* non-empty without conflicts,

then making R derive a new nonterminal, unique to R, will not create

a conflict. (R already succesfully derives tokens that are not necessarily

unique to the R rule, so why wouldn't it handle an additional unique one.)

*> 4. Do (1) and (3) hold for other types of CFG?*

I suspect (1) may pop out from what it means for a grammar to be "context

free", but since I already hand-waved through the first comment about it, I'm

going to leave it at that. :) I look forward to other responses.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.