Fri, 07 Mar 2008 07:09:32 -0600

Related articles |
---|

Computing Follow set pavan.mail@gmail.com (pavan) (2008-03-06) |

Re: Computing Follow set rsc@swtch.com (Russ Cox) (2008-03-06) |

Re: Computing Follow set torbenm@app-2.diku.dk (2008-03-07) |

Re: Computing Follow set max@gustavus.edu (Max Hailperin) (2008-03-07) |

Re: Computing Follow set haberg_20080207@math.su.se (Hans Aberg) (2008-03-08) |

From: | Max Hailperin <max@gustavus.edu> |

Newsgroups: | comp.compilers |

Date: | Fri, 07 Mar 2008 07:09:32 -0600 |

Organization: | Compilers Central |

References: | 08-03-033 |

Keywords: | LR(1), comment |

Posted-Date: | 08 Mar 2008 10:57:27 EST |

pavan <pavan.mail@gmail.com> writes:

*> I have a question regarding the computation of FOLLOW sets.*

*> Consider the following grammar:*

*>*

*> A -> aB | a*

*> B -> bA | b*

*>*

*> From the production A -> aB, we have FOLLOW(B) contains FOLLOW(A).*

*> From the production B -> bA, we have FOLLOW(A) contains FOLLOW(B).*

*>*

*> This ends up being an infinite loop when I code it. I would appreciate*

*> your suggestions on this.*

*>*

*> Thank you.*

Great question. In fact, beyond the issue of code going into an

infinite loop, there is even a question of what the FOLLOW sets should

be once the code gets done (somehow) computing them. For example, is

b in FOLLOW(A)? Well, it is if it is in FOLLOW(B). But is it in

FOLLOW(B)? Well, it is if it is in FOLLOW(A). So, it would be

consistent to conclude that it is in both FOLLOW sets -- or neither.

Of course, only one of those two consistent answers is the *correct*

one -- the one that describes the terminals that can actually come

after an A or B in a derivation. And the correct answer, as I hope

you'll see, is the one without b -- we want the smaller, junk-free

solution to the system of equations that defines the FOLLOW sets.

This is becauase in any derivation starting from A, the A or B will be

followed only by the end of input (generally symbolized as $).

This turns out to be an example of a general phenomenon that recurs

over and over again in compiler-related problems (as well as in some

other areas). Other examples would be the epsilong-closure process

used in translating an NFA to a DFA, the computation of FIRST sets,

and monotone dataflow analysis. In all these cases, we have a cyclic

system of constraints that describe some values, but which do not (in

general) uniquely specify them. The solution we want is the least

solution. We can find this least solution using the following

algorithmic approach (which is what you were asking for).

Initially approximate the values by the least element of their type;

for FOLLOW sets, this would mean to start by assuming they are all

empty. Then look for violated constraints. In your example, you would

notice that $ needs to be in FOLLOW(A), but is not yet, because

FOLLOW(A) is empty. So, increase the too-small value so as to correct

this constraint violation; that is, add $ into FOLLOW(A). Now repeat

this process of finding and repairing constraint violations so long as

there are any. In each case, the repair would consist of increasing

some value -- in the case of computing FOLLOW sets, adding something

into a FOLLOW set. In your example, your second check for a

constraint violation would find that FOLLOW(A) is not a subset of

FOLLOW(B), as you correctly observed it must be. To repair this, you

would add $ into FOLLOW(B). At this point, there would be no more

constraint violations, which would be the algorithm's sign that it

should terminate. So, to answer your question about the code looping,

the loop test should be whether there is any further motivation to add

any symbols to any of the FOLLOW sets.

The reason why this algorithm is sure to terminate is because we only

ever increase the FOLLOW sets and there is a limit to how large they

can grow; at most each contains all the terminals and $. Translating

this argument to a more general setting, the algorithm would terminate

not only for increasing sets that have some finite bound, but in any

case where we are increasing values in some partial order that doesn't

have infinitely ascending chains.

The somewhat more subtle question is how we know that the solution we

find upon termination is the one we wanted -- the junk-free one. In

fact, we want to be sure that we arrive at this same, correct,

solution no matter what choices we make along the way at any point

where there is more than one constraint violation we could choose to

repair.

Intuitively, the reason why we know we reach the correct, junk-free,

solution is that we start with empty sets and only add anything to

them when we had a good reason to. For a more rigorous argument, we

could recognize the constraint repair process as the application of

monotonic functions and use their monotonicity to construct an

inductive proof that their least fixed point is reached. To see how

this is done, I suggest you take a look at a pedagogic paper I wrote

some years back about teaching the computation of FIRST sets along

these lines: http://gustavus.edu/+max/ffp-sigcse.html

-Max Hailperin

Professor of Computer Science

Gustavus Adolphus College

[Thanks to several other people who sent in shorter messages with similar

advice. -John]

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.