LR-grammar differences

weberwu@tfh-berlin.de (Debora Weber-Wulff)
22 Dec 1995 11:14:18 -0500

          From comp.compilers

Related articles
LR-grammar differences weberwu@tfh-berlin.de (1995-12-22)
| List of all articles for this month |

From: weberwu@tfh-berlin.de (Debora Weber-Wulff)
Newsgroups: comp.compilers
Date: 22 Dec 1995 11:14:18 -0500
Organization: TFH-Berlin (Berlin, Germany)
Keywords: LR(1), parse

The LL vs. LR thread has been helpful, but I still want to understand
the differences between the various LR schemes.


I'm trying to get the differences between the LR-parser types sorted
out in my head and fit onto one sheet of paper. After perusing a
number of books and papers: Aho, ([Sethi,] Ullman) + Johnson; Sippu,
Soisalon-Soininen; Mayer; DeRemer; Knuth; Pager; Bermudez; Geller,
Harrison; I am a tad confused. Each uses their own sort of definitions
and I am not real sure of the differences. This is what I hope I've
understood:


canonical LR(k) (k > 0) grammars:
A parsing table can be created from such grammars that pre-compute the
k-lookahead into the states by a magic process. This makes life easy
on the parser, which doesn't have to worry about lookahead or lookback
schemes, but tends to have "enormous"-sized tables (although in these
days since Microsoft has conditioned us to accept bloated software and
our machines have a good few MB on board, the spectre of a 20000 entry
table doesn't really scare me - should it?)


canonical LR(0) grammars:
Nice 'n easy to make the table here because there's no lookahead. One
constructs a non-deterministic finite state automaton with the items
constructed from the productions as the states and appropriate
transitions. Then an application of the Myhill construction (closure)
squeezes an equivalent deterministic automaton out of the
NDFSA. Unfortunately, interesting constructs for programming languages
tend to provoke reduce-reduce or shift-reduce conflicts, rendering
this type of table useless.


Simple LR(1) grammars:
Takes the canonical LR(0) table and slaps on an easy to calculate
FOLLOW set to unravel the conflicts based on the next character in the
input (lookahead of 1). Unfortunately, it doesn't have enough
information about left context. It can almost do most interesting
programming language constructs, but differentiating procedure calls
from identifiers in the last position of an expression in a language
with statement terminators is impossible because the FOLLOW set is too
big.


LALR(1) grammars:
For these grammars we can save the day by calculating a bit more
refined FOLLOW. This will cover just about everything we need, which
is why yacc likes LALR(1) grammars. The only problem is with errors -
an error may not be detected at the earliest possible point (i.e. the
point at which the canonical LR(1) table would scream), a sequence of
reductions may take place. However, since no shifting will take place
it doesn't really matter. There are some grammars, however, that while
being LR(1) are not LALR(1) because the funny merging operation will
introduce a conflict.


Is this correct? This ignores all the compaction schemes I've only
ever found mentioned in the developers papers... Maybe I shouldn't
bother and just teach LL?


Does anyone have a pointer to reports of use on modern compiler-
compilers? What is cheap and easy for beginners to understand? I'm
teaching compiler construction for the first time and I can't stomach
teaching yacc...
--
Debora Weber-Wulff (Professorin fuer Softwaretechnik und Programmiersprachen)
Technische Fachhochschule Berlin, FB Informatik, Luxemburger Str. 10,
13353 Berlin, Germany email: weberwu@tfh-berlin.de
<http://www.tfh-berlin.de/usr1/name/weberwu/public_html/>
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.