Re: LR(n) parsers

Thomas Schoebel <schoebel@bs5.informatik.uni-stuttgart.de>
25 Oct 91 10:11:28 GMT

From comp.compilers

Related articles
[9 earlier articles]
Re: LR(n) parsers rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell) (1991-10-19)
Re: LR(n) parsers schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel) (1991-10-19)
Re: LR(n) parsers drw@riesz.mit.edu (1991-10-22)
Re: LR(n) parsers schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel) (1991-10-24)
Re: LR(n) parsers sankar@Neon.Stanford.EDU (Sriram Sankar) (1991-10-24)
Re: LR(n) parsers ge@sci.kun.nl (1991-10-25)
*Re: LR(n) parsers schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel)* (1991-10-25)**
Re: LR(n) parsers markh@csd4.csd.uwm.edu (1991-11-06)

| List of all articles for this month |

Newsgroups:	comp.compilers,comp.theory
From:	Thomas Schoebel <schoebel@bs5.informatik.uni-stuttgart.de>
Keywords:	parse, theory, LR(1)
Organization:	IfI, Univ. Stuttgart, W Germany
References:	91-10-036 91-10-093
Date:	25 Oct 91 10:11:28 GMT

In article 91-10-093 schoebel@bs5.informatik.uni-stuttgart.de (Thomas Schoebel) writes:

> No, it's even the same for k=0. See Hopcroft/Ullman page 256, where it is
> shown that any deterministic pushdown automata (DPDA) can be converted to
> an LR(0) grammar. Since any LR(k) (k>0) parser is in fact a DPDA, the
> *language classes* are equal for any k.

Sorry, I have to correct myself: The language class LR(0) as defined in
Hopcroft/Ullman is a strict subset of the LR(1) class. This is because of
the prefix property. In Hopcroft/Ullman, any deterministic context-free
language (even without the prefix property) can be converted to an LR(1)
grammar. LR(0) grammars cannot generate languages without prefix property.

However, this distinction is only of theoretical interest to a compiler
writer. In practice, *ANY* parser uses a special end-of-file symbol, and
*ANY* scanner must be designed so that distinction of "an input symbol is
present" and "no more input symbol is present" is carried out. This is
simply because a "non-existant symbol" is something we have to deal with
and therefore must be explicitly or implicitely stated somewhere. If it
were possible to omit this distinction, *ANY* parser (even LR(1) when
using the lookahead) could not decide at a point where it is possible to
accept, whether to continue (and hopefully find a matching input) or to
accept in this situation.

So in *practice* the language classes LR(0) and LR(1) are equal. The
theoretical difference comes only from the fact that "acceptance"
of a pushdown automaton is defined in Hopcroft/Ullman by two conditions:
1) The automaton has reached an accepting state
2) The input word is fully processed
This definition inherently contains a nondeterminism, so IMHO the
Hopcroft/Umman definiton of "deterministic pushdown automaton" (DPDA)
is in some sense not fully correct.

This can be seen if the DPDA has to be simulated by a deterministic turing
machine (DTM): With turing machines, acceptance is usually defined by
simply reaching an accepting state. If the turing machine has to determine
the above condition 2), it has to look "after" the input word whether
there is a blank. The turing machine *MUST* contain a transition for blank
in its \delta function if it has to decide this deterministically. But
augmenting the alphabet by a new symbol (call it either "EOF" or "blank")
which has to follow the original word and testing for its presence is
nothing but changing the language so that it has the prefix property. The
DTM has to *touch* the blank, and this is something not belonging to the
input. In other words, simulation of DPDA's by DTM's is in general
impossible without transforming the input language.

To me it is questionable where a DPDA as defined in Hopcroft/Ullman is
really *deterministic* in full sense. That the transition relation \delta
must be a function rather than a true relation is not enough; it is also
important that leaving an accepting state by comsuming the next symbol is
not possible, because this is a HIDDEN NONDETERMINISM. Even in
Hopcroft/Ullman, a DPA is supposed to always look only at one input symbol
at a time, so consequently it has no chance to determine situation 2) of
above. The test of situation 2) is not done by the DPDA itself, rather
than by a "watching instance" of the computing process.

Suppose there are transitions out from an accepting state. Without
lookahead, the automaton now has to *decide* whether to accept or to
continue. If it decides to continue at the last symbol, then it dies and
hence would reject the input word. So this inherent nondeterministic
choice leads to acceptance in one case and to rejection in the other one.
Something is not well-defined here!!

It appears to me that deterministic context-free languages without prefix
property as defined in Hopcroft/Ullman are something which is not really
deterministic. IMHO, determinism should be better defined to imply the
prefix property automatically.

-- Thomas
[The LR(0)/LR(1) distinction was also noted by Sriram Sankar
<sankar@Neon.Stanford.EDU> -John]
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: LR(n) parsers

Thomas Schoebel <schoebel@bs5.informatik.uni-stuttgart.de>25 Oct 91 10:11:28 GMT

Thomas Schoebel <schoebel@bs5.informatik.uni-stuttgart.de>
25 Oct 91 10:11:28 GMT