Re: Computing first sets on a left-recursive grammar

"Paul Mann" <paul@parsetec.com>
12 Apr 2001 02:43:58 -0400

          From comp.compilers

Related articles
Computing first sets on a left-recursive grammar cyrus@stwing.upenn.edu (Cyrus Najmabadi) (2001-04-10)
Re: Computing first sets on a left-recursive grammar paul@parsetec.com (Paul Mann) (2001-04-12)
Re: Computing first sets on a left-recursive grammar soenke.kannapinn@wincor-nixdorf.com (Sönke Kannapinn) (2001-04-12)
Re: Computing first sets on a left-recursive grammar max@max.mcs.gac.edu (Max Hailperin) (2001-04-14)
Re: Computing first sets on a left-recursive grammar parag@pinhead.parag.codegen.com (2001-04-14)
| List of all articles for this month |

From: "Paul Mann" <paul@parsetec.com>
Newsgroups: comp.compilers
Date: 12 Apr 2001 02:43:58 -0400
Organization: Compilers Central
References: 01-04-062
Keywords: parse, LL(1)
Posted-Date: 12 Apr 2001 02:43:58 EDT

"Cyrus Najmabadi" <cyrus@stwing.upenn.edu> wrote:


> The fault seems to lie in left-recursive
> grammar productions. Because "First" calls itself on the individual
> elements of the yield of the production, a left-recursive production
> will cause "first" to loop forever.
>
> Is the correct solution to eliminate left-recursion? Or,
> if it's not a limitation, can someone post (or link to) an algorithm
> that can calculate the "first" set even on a left-recursive
> production.
>


----------------------------------------------------------------------------


Definition:


The First set is a relationship between nonterminals and
terminals of a grammar, such that: First (X) gives all the
"first" terminal symbols in the rules defining X.


X -> a
        -> Y


Y -> b
      -> Z


Z -> c


First (X) is the set { a, b, c }
First (Y) is the set { b, c }
First (Z) is the set { c }


Left recursion is not a problem, because you just let
First () call itself when it finds a nonterminal as the first
symbol of a rule.


--------------------------------------------------------------------------


The first problem occurs when cycles are found:


X -> a
        -> Y


Y -> b
      -> Z


Z -> c
      -> X


First (X) is the set { a, b, c }
First (Y) is the set { a, b, c }
First (Z) is the set { a, b, c }


In this case, First (Z) requires First (X) which is unknown
because of the cycle: X -> Y -> Z -> X.


The way to solve this is to keep a boolean array
WorkingOn [number_of_nonterminals].


In the main interation loop thru the nonterminals:


for (nt = 0; nt < n_nonterminals; nt++)
{
                CycleFound = 0;
                for (i = 0; i < n_nonterminals; i++) WorkingOn [i] = 0;
                ComputeFirst (nt);
                if (CycleFound) CopyFirstSet (nt);
}


For each nonterminal, set the whole WorkingOn array
to zeros.


Have ComputeFirst (nt) check WorkingOn [nt].
If zero, set WorkingOn [nt] to 1, and continue.
If not zero, set CycleFound to 1, then return.


After getting back to the main loop check CycleFound,
for potential copy propagation.


CopyFirstSet (nt) should copy the First (nt) set to all
the other nonterminals indicated in the WorkingOn
array, by their non-zero entries.


--------------------------------------------------------------------------


The second problem is that this is not optimal. The
results are fine, but many calls are made over and
over to compute the First sets of nonterminals,
especially those that are at the leaf-nodes of the
grammar.


This can done in an optimal way by first populating
a graph (or bit matrix: NT by T) with just the known
first terminals for each nonterminal, by examining
the grammar (rules).


Then compute a dependency relationship between
nonterminals, another graph (or bit matrix: NT by NT),
by examining the grammar, very similar to the previous
step.


Now use Warshall's algorithm or DeRemer and
Pennello's Digraph algorithm (about 3 times faster
for sparsely populated graphs) to compute transitive
closure of the NT by NT matrix and while doing
this adjust the NT by T matrix in an appropriate
way to give you the desired result.


-----------------------------------------------------------------------


The third problem is that you can have nullable
nonterminal symbols in a rule. If a NT is
nullable (X -> /* nothing */) and it is the first NT
of a rule in the grammar, then you have to use
both this NT and the next symbol in the rule for
computing a First set. For each NT that is
nullable you have to check the next symbol in
the rule.


That's all I can think of. Maybe I left something out.
Anybody else can feel free to check this or add to it.


Paul Mann
www.parsetec.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.