Re: regular expression operators in CF grammars

"=?Windows-1252?Q?S=F6nke_Kannapinn?=" <>
28 Jun 2002 18:39:19 -0400

          From comp.compilers

Related articles
[4 earlier articles]
Re: regular expression operators in CF grammars (Ralph Boland) (2002-06-17)
Re: regular expression operators in CF grammars (Chris F Clark) (2002-06-17)
Re: regular expression operators in CF grammars (2002-06-20)
Re: regular expression operators in CF grammars (2002-06-28)
Re: regular expression operators in CF grammars (SLK Parsers) (2002-06-28)
Re: regular expression operators in CF grammars (SLK Parsers) (2002-06-28)
Re: regular expression operators in CF grammars (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2002-06-28)
Re: regular expression operators in CF grammars (Chris F Clark) (2002-07-02)
Re: regular expression operators in CF grammars (Chris F Clark) (2002-07-02)
Re: regular expression operators in CF grammars (Joachim Durchholz) (2002-07-02)
Re: regular expression operators in CF grammars (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2002-07-04)
Re: regular expression operators in CF grammars (SLK Parsers) (2002-07-04)
Re: regular expression operators in CF grammars (Ralph Boland) (2002-07-04)
[9 later articles]
| List of all articles for this month |

From: "=?Windows-1252?Q?S=F6nke_Kannapinn?=" <>
Newsgroups: comp.compilers
Date: 28 Jun 2002 18:39:19 -0400
Organization: Siemens Inc.
References: 02-05-142 02-05-151 02-06-024 02-06-056
Keywords: parse, bibliography
Posted-Date: 28 Jun 2002 18:39:19 EDT

Chris F Clark writes:
> That being said, there are some fairly straight-forward algorithms for
> implementing EBNF translations directly as part of the parser
> generation.
> (...)
> It "is" more difficult to do so correctly in the LR case, than the LL
> case because of the way reductions are handled in LR. However, there
> is even a complete set of published algorithms that cover implementing
> EBNF in an LR parser written by Nakata and Sassa--not that those
> articles are easy to track down nor easy to read.

Unfortunately, closer analysis of the publications that deal with
extended context-free grammars (ecfgs) or regular right part grammars
(rrpgs) reveals some severe errors, and in my humble opinion, most of
these publications are more or less disappointing. (The only solid
contributions are from Madsen and Kristensen (1976) and Heilbrunner
(1981).) A common problem is a sloppy treatment of theory; and without
a solid theoretical foundation, parser correctness cannot be proved.

Nearly all contributions to ELR parser construction -- namely LaLonde
(1977, 1979), Purdom and Brown (1971), Chapman (1987), Nakata and
Sassa (1984), Shin and Choe (1993), Fortes Gálvez (1994), and Lee and
Kim (1997) -- have a concept of ELR(k)-derivation step in which a lhs
nonterminal is replaced by some expansion of the corresponding rhs
(i.e., a sentence described by the rhs finite automaton or regular
expression) IGNORING how that rhs expansion had been achieved. In
other words: all these authors abstract from the structure of rhs

Not only do I personally believe that this abstraction is inadequate
for the purpose of compiler construction. The above mentioned authors
also do not treat that abstraction correctly in their parser
constructions in the light of the results of Heilbrunner (1979). Let
me explain this point briefly.

Instead of using regular expressions or finite automata, Heilbrunner
(1979) uses unambiguous right-linear subgrammars to describe the rhs
expansions so that classical LR theory for cfgs can be applied. He
proves that whenever SOME ecfg with unambiguous right-linear
subgrammars is LR(k) ANY equivalent such ecfg will also be LR(k).

Because of that, construction of provably correct (LA)LR(k) parsers
can be done by simply replacing rhs regular expressions or finite
automata by equivalent unambigous right-linear subgrammars. For
ambiguous rhs regular expressions or finite automata this requires
some sort of subset construction to remove ambiguity. In contrast, the
above mentioned authors all rely on LR(k) item set definitions that
use the unmodified original grammar rules -- and they run into
problems (except Purdom and Brown (1981) and Nakata and Sassa (1986)
who -- unnecessarily -- restrict their approach to deterministic rhs
finite automata; and they have other problems).

Note that subset constructions to remove unambiguity imply, of course,
that a derivation using the transformed grammar cannot be uniquely
interpreted in terms of the structures of the original grammar,
especially in terms of their (ambiguous) rhss.

I personally prefer to call an rrpg ELR(k) iff its straightforward
cfg-transcription using (unambiguous) right-linear subgrammars is
LR(k). And I call an ecfg ELR(k) iff the cfg with (unambiguous)
right-linear subgrammars achieved by transforming the rhs regular
expressions into finite automata with epsilon-moves and then into
right-linear subgrammars is LR(k). (This requires rhs regular
expressions to be "strongly unambiguous" (Brüggemann-Klein 1993).) An
ELR(k) parser theory (such as that of Madsen and Kristensen (1976))
that uses that transformational approach does NOT abstract from rhs
structures and is able to reconstruct the structure of a complex
handle in terms of the corresponding structural rhs ingredients. I
strongly feel that THIS is what ELR parsers should accomplish.

You especially mentioned the approach of Nakata and Sassa (1986) to
ELALR(k) parsing. Their article in Acta Informatica does not even
define a derivation relation; it does not explain where look- aheads
come from; their characterization of kernel items is not
well-defined. And no attempt is made there to prove missing important
basic results they rely on (e.g.: correctness of the item set
construction for the new type of "item"!).

Sönke Kannapinn.

Brüggemann-Klein, A. (1993): Regular expressions into finite
        automata. TCS 120, pp. 197--213.
Chapman, N.P. (1987): LALR(1,1) parser generation for regular right
        part grammars. Acta Inf. 21, pp. 29--45.
Fortes Gálvez, J. (1994): A note on a proposed LALR parser for
        extended context-free grammars. Inf. Proc. Letters 50,
        pp. 303--305.
Heilbrunner S. (1979): On the definition of ELR(k) and ELL(k)
        grammars. Acta Inf. 11, pp. 169--176.
LaLonde, W.R. (1977): Regular right part grammars and their parsers.
        CACM 20(10), pp. 731--741.
LaLonde, W.R. (1979): Constructing LR parsers for regular right part
        grammars. Acta Inf. 11, pp. 173--193.
Lee, G.-O. and D.-H. Kim (1997): Characterization of extended LR(k)
        grammars. Inf. Proc. Letters 64, pp. 75--82.
Madsen, O.L. and B.B. Kristensen (1976): LR-parsing of extended
        context free grammars. Acta Inf. 7, pp. 61--73.
Nakata, I. and M. Sassa (1986): Generation of efficient LALR(1)
        parsers for regular right part grammars. Acta Inf. 23,
        pp. 149--162.
Purdom, P.W. and C.A. Brown (1981): Parsing extended LR(k) grammars.
        Acta Inf. 15, pp. 115--127.
Shin, H.-C. and K.-M. Choe (1993): An improved LALR(k) parser
        generation for regular right part grammars. Inf. Proc. Letters
        47, pp. 123--129.

For a thorough analysis of ELR parsing see my doctoral dissertation
(in German)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.