Fri, 7 Feb 2020 16:12:43 +0200

Related articles |
---|

FIRST_k, FOLLOW_k, k>1 borucki.andrzej@gmail.com (Andy) (2020-02-06) |

FIRST_k, FOLLOW_k, k>1 christopher.f.clark@compiler-resources.com (Christopher F Clark) (2020-02-07) |

From: | Christopher F Clark <christopher.f.clark@compiler-resources.com> |

Newsgroups: | comp.compilers |

Date: | Fri, 7 Feb 2020 16:12:43 +0200 |

Organization: | Compilers Central |

References: | 20-02-004 |

Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="78562"; mail-complaints-to="abuse@iecc.com" |

Keywords: | parse |

Posted-Date: | 07 Feb 2020 22:17:31 EST |

*> If LL(l) and LR(k) need sets FIRST_k, FOLLOW_k, k>1, for example LR(3) need this sets of degree 3?*

*> How make it? How is the best structure for these sets? I think about pyramid:*

...

*> For example, not bit set but set of trees or DFA's ?*

I've been meaning to write a paper on this for awhile, so I'll give

you a very rough draft here.

We implemented a version of it in Yacc++, although my understanding of

it was much more

crude at that time.

Everything you need to know is in the LR(0) machine, which is

essentially a DFA (with a stack).

The problem is you need to disentangle it.

Let's take a reduce-reduce conflict. You are in some state. I diagram

states by showing their

dotted item, using Yacc++ notation for the rules.

That state looks something like this:

state r-r:

x: a b c .; // the dotted item says we will reduce x in this state

y: z c.; // we also reduce a y in this state.

So, now we need to know where the goto-function will take us if we

reduce an x and where it

will take us if we reduce a y.

Now, in Yacc++ we know where we started the productions for x and y,

we call them dot-states

(as they are the states with the dotted item before an x or a y).

There is often more than one dot state for a given non-terminal. But,

we will assume just one

for ease of exposition.

state dot-x: // There will often be a set of these, one for every

context in which x was used.

x1: a .x b; // if see an x, goto state goto-x;

state goto-x:

x1: a x .b c; // b is the lookahead if you see x.

state dot-y:

y1: a b .y b; // if see a y, goto state goto-y;

state goto-y:

y1: a b y .b d; // b is the lookahead for y.

If you look carefully, you see that the goto states have dots before

(transitions on) exactly the

tokens that are lookaheads for the reductions that conflict.

And we have a conflict because the token b is in both states (sets).

If the two states are disjoint, have no tokens in common, then there

will be no conflict the tokens

in goto-x will be the lookaheads for reducing x and the tokens in

goto-y will be the lookaheads

for reducing y.

Now, if both states goto-x and goto-y are the same, it is an ambiguity

not just a conflict.

There will be two ways to parse some sentence (with an x or a y),

because once you get to the

shared goto state, the parse will be the same for both.

However, if there is a conflict (some tokens appear in both goto-x and

goto-y and they are not

the same state), then you need to look for another token of lookahead.

You do that by taking the transition(s) on the common tokens and

seeing what that state that

takes you to.

state goto-x-b:

x1: a x b .c; // bc is the lookahead if you see x.

state goto-y-b:

y1: a b y b .d; // bd is the lookahead if you see y

You continue doing this until you have disambiguated the lookahead or

you have proven the

grammar is truly ambiguous for some conflict.

There are a few more considerations.

Now, if one of those goto states is also a reduce state, you have to

apply the rules to backtrack

through calling rule.

e.g.

state dot-y;

y1: a b .y;

state dot-y1;:

y2: c .y1 b;

state goto-y1:

y2: c y1 .b;

If you merged states (ala LALR), you may need to un-merge (split) them

to resolve conflicts,

that are LALR only and not-LR.

You also have to deal with loops in the states.

If there is a finite k lookahead that solves the conflict, this is trivial.

You simply are doing an induction on k as you step through.

However, there are grammars that would be GLR parsable (and not

ambiguous) that include

loops.

That takes more work to recognize, although I am convinced that you

can compute a closure

that resolves it (and when I have proven that I will do the formal

paper on this).

What you are building could be called a lookahead automaton,

especially if you build something

that accepts regular expressions (or even more powerfully LR-type grammars).

But, this is the basic outline.

You can even use this to construct an LL (or recursive descent)

parser. You simply need to remember to do "calls" when you get to

states where you have computed a closure and you are processing one of

the non-terminals you have expanded in doing so. We do this in Yacc++

so we can handle variable length productions in ELR grammars without

backtracking, counting, or lookbacks.

And, you need one more trick if you have nullable productions (that's

another paper I need to write).

--

******************************************************************************

Chris Clark email: christopher.f.clark@compiler-resources.com

Compiler Resources, Inc. Web Site: http://world.std.com/~compres

23 Bailey Rd voice: (508) 435-5016

Berlin, MA 01503 USA twitter: @intel_chris

------------------------------------------------------------------------------

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.