Sat, 10 Sep 1994 00:34:13 GMT

Related articles |
---|

How to get this to an lalr(1) grammar? Mark_Tarrabain@mindlink.bc.ca (1994-08-22) |

Re: How to get this to an lalr(1) grammar? jholland@world.std.com (1994-08-22) |

Re: How to get this to an lalr(1) grammar? pjj@cs.man.ac.uk (1994-08-28) |

How to get this to an lalr(1) grammar? mph@zdenka.demon.co.uk (1994-08-28) |

Re: How to get this to an lalr(1) grammar? mickunas@mickunas.cs.uiuc.edu (1994-09-10) |

Re: How to get this to an lalr(1) grammar? dekker@dutiag.twi.tudelft.nl (1994-09-14) |

Re: How to get this to an lalr(1) grammar? mark@omnifest.uwm.edu (Mark Hopkins) (1994-10-09) |

Newsgroups: | comp.compilers |

From: | mickunas@mickunas.cs.uiuc.edu (Dennis Mickunas) |

Keywords: | parse, LALR |

Organization: | University of Illinois at Urbana |

References: | 94-08-137 |

Date: | Sat, 10 Sep 1994 00:34:13 GMT |

Mark_Tarrabain@mindlink.bc.ca (Mark Tarrabain) writes:

*> Hi. I've been trying to figure out how to make a certain grammar*

*>lalr(1). Or at least create an equivalent lalr(1) grammar that parses the*

*>same language. I'm not having much luck with it, so I thought I'd see if*

*>anybody else could figure it out. Here's the grammar as it presently*

*>exists (21 rules):*

*> s -> c*

*> s -> s t c*

*> c -> V n p*

*> a -> A*

*> a -> C o*

*> o -> /* empty */*

*> o -> A*

*> t -> T*

*> t -> D*

*> t -> a u*

*> u -> /* empty */*

*> u -> T*

*> n -> /* empty */*

*> n -> L b*

*> n -> q*

*> q -> N*

*> q -> q a N*

*> b -> /* empty */*

*> b -> B q*

*> p -> /* empty */*

*> p -> P N*

*> Capital letters denote terminals, lower case letters nonterminals. The*

*>language and grammar are very evidently LR(2). If my understanding is*

*>correct, it should be possible to simplify it to LR(1), perhaps even*

*>LALR(1).*

It's little wonder that you're having difficulty transforming the

grammar from LR(2) to LR(1), since it is not LR(2) to begin with. To

see this, note that the strings "V N C A V" and "V N C A N" have the

following parses:

^VNCAV shift

V^NCAV shift

VN^CAV reduce q->N

Vq^CAV reduce n->q

Vn^CAV reduce p->/* empty */

Vnp^CAV reduce c->Vnp

c^CAV reduce s->c

s^CAV shift

sC^AV shift

sCA^V reduce o->A

sCo^V reduce a->Co

sa^V reduce u->/* empty */

sau^V reduce t->au

st^V shift

stV^ reduce n->/* empty */

stVn^ reduce p->/* empty */

stVnp^ reduce c->Vnp

stc^ reduce s->stc

s^ accept

and

^VNCAN shift

V^NCAN shift

VN^CAN reduce q->N

Vq^CAN shift

VqC^AN shift

VqCA^N reduce o->A

VqCo^N reduce a->Co

Vqa^N shift

VqaN^ reduce q->qaN

Vq^ reduce n->q

Vn^ reduce p->/* empty */

Vnp^ reduce c->Vnp

c^ reduce s->c

s^ accept

Notice that with the sentential forms

Vq^CAV (reduce n->q)

and

Vq^CAN (shift)

it is not possible to resolve the shift-reduce conflict with merely 2

symbol look-ahead. Thus the grammar is at best LR(3).

I was able to discover this by first attempting (by hand) a mechanical

transformation from LR(k) to LR(k-1), using a hybrid algorithm derived

from the paper [5] that Michael Harrison mentioned in a previous

response, and a subsequent paper [6]. (The reason for the hybrid is

somewhat technical, having to do with whether the semantics of parses

involving /*empty*/ rules can be preserved.)

I believe that your intention was to preserve the structure imposed by

the grammar. (Otherwise, it's dead easy to construct a grammar that

works. Note that the language is regular. Simple back-substitution

and recursion-to-Kleene-closure provides a regular expression.) So I

wanted to avoid mangling your grammatical structure. Thus I was

attempting to obtain a "complete cover" so as to permit the semantics

to remain unchanged. Even with the hybrid technique (which uses as

many shortcuts as I can imagine), the resulting grammar contains 143

rules. Berkeley yacc subsequently shows a shift-reduce conflict,

indicating that the new grammar is not LR(1), and that the original

is therefore not LR(2). Analyzing the shift-reduce conflict led to

the non-LR(2) counterexample above. (If anyone is interested, I can

post the 143-rule grammar.)

I do believe from examining the conflicting structures that the

original grammar *is* LR(3). I could verify that by performing the

LR(k)-to-LR(k-1) transformation a second time; but that's something

I'm not prepared to do by hand in this lifetime, since the resulting

grammar would probably have about 1000 rules. It would be possible to

perform a hybrid transformation from LR(k)-to-LR(k-2) but I believe

that again a very large grammar would result. This is not due to

inherent inefficiency in the transformation, but rather is necessary

in order to preserve the structure imposed by the original grammar.

Judging by some responses concerning the equivalence of LR(k) and

LR(1), it is still apparently a well-kept secret that once you have

obtained an LR(k) grammar, G, it is possible to obtain *mechanically*

an LR(1) grammar, G' (or even an LR(0) grammar, provided that the

language is prefix-free). In fact, the resulting grammar "completely

covers" the original LR(k) grammar (meaning that using only table

look-up, a G'-parse may be transformed into the original G-parse).

Those who are interested should first understand various

characterizations of LR(0) languages (see for example, Michael

Harrison's text [1], Section 13.3; also [1] cites other relevant

papers, especially by Harrison & Havel, and Geller & Harrison). For a

treatment of cover grammars, see papers by James Gray & Michael

Harrison [2], or the work by Anton Nijholt [3,4]. The formal LR(k) to

LR(0) algorithms are presented in [5,6]. If you'd like just a brief

tutorial, look at the file /pub/mickunas/pascal.descr available via

anonymous ftp from ftp.cs.uiuc.edu, which describes how to obtain

both an LR(0) version of the conventional expression grammar, as well

as an LR(0) grammar for Pascal.

[1] Harrison, M. A., _Introduction to Formal Language Theory_,

Addison-Wesley, Reading, MA. (1978).

[2] Gray, J.N., and M.A. Harrison, "On the covering and reduction

problems for context-free grammars," _Journal of the ACM 19_,

pp. 675-698 (1972).

[3] Nijholt, A., "On the Covering of Parsable Grammars," _Journal

of Computer and System Sciences 15_, pp.99-110 (1977).

[4] Nijholt, A., "A Survey of Normal Form Covers for Context Free

Grammars," Informatica Rapport 49, Vrije Universiteit (1979).

[5] Mickunas, M.D., R.L. Lancaster, and V.B. Schneider, "Transforming

LR(k) Grammars to LR(1), SLR(1), and (1,1) Bounded Right-Context

Grammars," _Journal of the ACM 23_, pp.511-533 (1976).

[6] Mickunas, M.D., "On the Complete Covering Problem for LR(k)

Grammars," _Journal of the ACM 23_, pp.17-30 (1976).

---

M. Dennis Mickunas

Department of Computer Science 1304 W. Springfield Ave.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.