16 Sep 1996 14:45:26 -0400

Related articles |
---|

some basic questions about parsing mherstin@bbn.com (Michael Herstine) (1996-09-15) |

Re: some basic questions about parsing salomon@silver.cs.umanitoba.ca (1996-09-16) |

Re: some basic questions about parsing torbenm@diku.dk (1996-09-16) |

From: | torbenm@diku.dk (Torben AEgidius Mogensen) |

Newsgroups: | comp.compilers |

Date: | 16 Sep 1996 14:45:26 -0400 |

Organization: | Department of Computer Science, U of Copenhagen |

References: | 96-09-053 |

Keywords: | parse |

Michael Herstine <mherstin@bbn.com> writes:

*>I'm reading Jim Holme's "Object Oriented Compiler Construction". In*

*>the chapter on parsing, he refers to the theorem that to every*

*>language generated by a context free grammar there corresponds a*

*>(possibly non-deterministic) pda that solves the recognition problem*

*>for that language. He notes that we don't want to deal with*

*>non-deterministic automata, and states that instead, we should find*

*>the most powerful (deterministic, I suppose) automaton that we can,*

*>and restrict ourselves to languages that can be recognized by such*

*>automata. He then introduces the shift-reduce PDA, and states without*

*>proof that the set of languages that shift-reduce PDAs can recognize*

*>is called the LR(1) languages (and intimates that the LR(1) languages*

*>live somewhere between regular languages and context free languages).*

*>I have several questions:*

*>1. If we seek the most powerful deterministic automaton possible,*

*>wouldn't that be a Turing machine? Since they can recognize unrestricted*

*>phrase structure grammars, a much larger set of grammars than context*

*>free, why don't we use them as parsers?*

John Holme may have forgotten to mention the reason he doesn't want to

handle non-deterministic automata: it takes (almost) cubic time to

simulate these on a sequential machine (using the currently best known

techniques). Turing machines can run for any amount of time, even

forever, so they are not generally useful for parsing unless some

extra restrictions are put on them to guarantee fast termination.

Deterministic PDAs run in linear time, which makes them useful for

parsing. Deterministic 2-way PDA's recognize a larger class of

languages than deterministic 1-way PDA's, including some

non-contextfree languages (e.g. a^n b^n c^n) and can also be simulated

in linear time. They are, however, not used for general parsing

(i.e. as the target of parser generators). There are several reasons

for this: 1) there is no simple way of generating a 2DPDA from a

non-LR grammar. 2) the simulation time, though linear in input, also

depends (linearly) on the size of the automaton. This makes it

generally slower than 1DPDA simulation which is (more or less)

independent of the size of the automaton.

*>2. What sort of grammars generate LR(1) languages?*

LR(1) grammars, obviously. ;-)

Seriously, it appears that the easiest way to recognize a LR(1)

grammar is to generate a LR(1) parsing table and look for conflicts.

Note, though, that ambigous grammars are never LR(1).

*>3. The production rules that he writes down for Pascal are clearly*

*>context free, but seem to include rules that allow non-terminals to go to*

*>the null string - doesn't that put them in the class of unrestricted*

*>phrase structure grammars (they're not allowed in context free grammars,*

*>are they)?*

For context free grammars it is easy to show that adding empty

productions only extends the class of languages to languages that are

the unions of context free languages and the set containing the empty

string. In other words, you can eliminate empty productions by

transforming a grammar and obtain a grammar generating the same

language bar the empty string. Hence, empty productions are pretty

harmless in comtext free grammars, and they are quite convenient. So

most parser generators and compiler textbooks allow these.

Adding empty productions to context sensitive grammars (type 2

grammars in the Chomsky-hierarchy) gives them the power of

unrestricted grammars (type 1 grammars), though.

Torben Mogensen (torbenm@diku.dk)

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.