Conversion to LL(1)

ebcrmb@ebc.ericsson.se (BO/EBC/KX/ZGE Michael Bergman 682 4958)
Thu, 18 Feb 1993 16:07:38 GMT

          From comp.compilers

Related articles
Conversion to LL(1) ebcrmb@ebc.ericsson.se (1993-02-18)
| List of all articles for this month |

Newsgroups: comp.compilers
From: ebcrmb@ebc.ericsson.se (BO/EBC/KX/ZGE Michael Bergman 682 4958)
Keywords: LL(1), parse, question
Organization: Compilers Central
Date: Thu, 18 Feb 1993 16:07:38 GMT

At the risk of being repetitive about this, I would *really* like someone
to to tell me (Jan..?) the algorithms for


a) Eliminating all epsilon productions from a CFG
b) Eliminating all cycles from a CFG


The left-recursion removal (and I suspect left-factoring as well, but this
is only a suspicion!) is only guaranteed to work if the grammar is
epsilon-free and cycle free.


I've read through a couple of old articles (Jan Schramp seems to have done
quite a lot of thinking in the area) and the subject interests me.


The left-factoring as described in the Dragon Book is IMO rather limited
as it only tries to get rid of "direct" FIRST overlap of the type:


A -> BC | BC | BC | ... | gamma
              1 2 3


where gamma represents all alternatives not beginning with B. However,
there may still be FIRST-set overlap even if we do not have this kind of
"direct" production. Substitutions may give such a production, but then
one might run into a loop as Jan Schramp pointed out in an article dated
May 18 1992.


His example was:
S -> Aa | Bb
A -> cA | eps
B -> cB | eps


To get the common leading c in the same production, we substitute for A
and B on S's right side and get


S -> cAa | cBb | a | b
A -> cA | eps
B -> cB | eps


After factoring S, we run into a the same situation again with A and B
both having c in their FIRST sets. Then Jan tries another approach and
first eliminates the epsilon productions:


S -> A | B
A -> cA | a
B -> cB | b


Then he wants to get the "direct" form of the common prefix situation:


S -> cA | a | cB | b
A -> cA | a
B -> cB | b


Factoring now gives:


S -> cD | a | b
D -> A | B


But D is equivalent to S, and we get S -> cS | a | b which is LL(1).




I'm wondering if the elimination of the epsilon productions was
significant. It would be nice if we knew the left-factoring was guranteed
to work (without getting into a loop) if all epsilon productions are first
eliminted. Comments?


On the other hand, this might not lead anywhere in practice since after
the left-recursion removal, the grammar may again contain epsilon
pruductions. These would then have to be removed using the algorithm I
was asking for, before we can apply the left-factoring. But the epsilon
elimination might very well re-introduce left-recursion.... :-(, I don't
know. On the other hand - maybe it does not!


I'm hoping someone out there is an expert on this.
--
Michael Bergman Email: ebcrmb@ebc.ericsson.se
KX/ZGE Tfn: +46 8 6824958
Ericsson Business Networks
S-135 83 Tyreso, Sweden
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.