12 Apr 2001 02:43:58 -0400

Related articles |
---|

Computing first sets on a left-recursive grammar cyrus@stwing.upenn.edu (Cyrus Najmabadi) (2001-04-10) |

Re: Computing first sets on a left-recursive grammar paul@parsetec.com (Paul Mann) (2001-04-12) |

Re: Computing first sets on a left-recursive grammar soenke.kannapinn@wincor-nixdorf.com (Sönke Kannapinn) (2001-04-12) |

Re: Computing first sets on a left-recursive grammar max@max.mcs.gac.edu (Max Hailperin) (2001-04-14) |

Re: Computing first sets on a left-recursive grammar parag@pinhead.parag.codegen.com (2001-04-14) |

From: | "Paul Mann" <paul@parsetec.com> |

Newsgroups: | comp.compilers |

Date: | 12 Apr 2001 02:43:58 -0400 |

Organization: | Compilers Central |

References: | 01-04-062 |

Keywords: | parse, LL(1) |

Posted-Date: | 12 Apr 2001 02:43:58 EDT |

"Cyrus Najmabadi" <cyrus@stwing.upenn.edu> wrote:

*> The fault seems to lie in left-recursive*

*> grammar productions. Because "First" calls itself on the individual*

*> elements of the yield of the production, a left-recursive production*

*> will cause "first" to loop forever.*

*>*

*> Is the correct solution to eliminate left-recursion? Or,*

*> if it's not a limitation, can someone post (or link to) an algorithm*

*> that can calculate the "first" set even on a left-recursive*

*> production.*

*>*

----------------------------------------------------------------------------

Definition:

The First set is a relationship between nonterminals and

terminals of a grammar, such that: First (X) gives all the

"first" terminal symbols in the rules defining X.

X -> a

-> Y

Y -> b

-> Z

Z -> c

First (X) is the set { a, b, c }

First (Y) is the set { b, c }

First (Z) is the set { c }

Left recursion is not a problem, because you just let

First () call itself when it finds a nonterminal as the first

symbol of a rule.

--------------------------------------------------------------------------

The first problem occurs when cycles are found:

X -> a

-> Y

Y -> b

-> Z

Z -> c

-> X

First (X) is the set { a, b, c }

First (Y) is the set { a, b, c }

First (Z) is the set { a, b, c }

In this case, First (Z) requires First (X) which is unknown

because of the cycle: X -> Y -> Z -> X.

The way to solve this is to keep a boolean array

WorkingOn [number_of_nonterminals].

In the main interation loop thru the nonterminals:

for (nt = 0; nt < n_nonterminals; nt++)

{

CycleFound = 0;

for (i = 0; i < n_nonterminals; i++) WorkingOn [i] = 0;

ComputeFirst (nt);

if (CycleFound) CopyFirstSet (nt);

*}*

For each nonterminal, set the whole WorkingOn array

to zeros.

Have ComputeFirst (nt) check WorkingOn [nt].

If zero, set WorkingOn [nt] to 1, and continue.

If not zero, set CycleFound to 1, then return.

After getting back to the main loop check CycleFound,

for potential copy propagation.

CopyFirstSet (nt) should copy the First (nt) set to all

the other nonterminals indicated in the WorkingOn

array, by their non-zero entries.

--------------------------------------------------------------------------

The second problem is that this is not optimal. The

results are fine, but many calls are made over and

over to compute the First sets of nonterminals,

especially those that are at the leaf-nodes of the

grammar.

This can done in an optimal way by first populating

a graph (or bit matrix: NT by T) with just the known

first terminals for each nonterminal, by examining

the grammar (rules).

Then compute a dependency relationship between

nonterminals, another graph (or bit matrix: NT by NT),

by examining the grammar, very similar to the previous

step.

Now use Warshall's algorithm or DeRemer and

Pennello's Digraph algorithm (about 3 times faster

for sparsely populated graphs) to compute transitive

closure of the NT by NT matrix and while doing

this adjust the NT by T matrix in an appropriate

way to give you the desired result.

-----------------------------------------------------------------------

The third problem is that you can have nullable

nonterminal symbols in a rule. If a NT is

nullable (X -> /* nothing */) and it is the first NT

of a rule in the grammar, then you have to use

both this NT and the next symbol in the rule for

computing a First set. For each NT that is

nullable you have to check the next symbol in

the rule.

That's all I can think of. Maybe I left something out.

Anybody else can feel free to check this or add to it.

Paul Mann

www.parsetec.com

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.