Sun, 9 Oct 1994 19:47:30 GMT

Related articles |
---|

How to get this to an lalr(1) grammar? Mark_Tarrabain@mindlink.bc.ca (1994-08-22) |

Re: How to get this to an lalr(1) grammar? jholland@world.std.com (1994-08-22) |

Re: How to get this to an lalr(1) grammar? pjj@cs.man.ac.uk (1994-08-28) |

How to get this to an lalr(1) grammar? mph@zdenka.demon.co.uk (1994-08-28) |

Re: How to get this to an lalr(1) grammar? mickunas@mickunas.cs.uiuc.edu (1994-09-10) |

Re: How to get this to an lalr(1) grammar? dekker@dutiag.twi.tudelft.nl (1994-09-14) |

Re: How to get this to an lalr(1) grammar? mark@omnifest.uwm.edu (Mark Hopkins) (1994-10-09) |

Newsgroups: | comp.compilers |

From: | "Mark Hopkins" <mark@omnifest.uwm.edu> |

Keywords: | parse, LALR |

Organization: | Compilers Central |

References: | 94-08-137 94-09-052 |

Date: | Sun, 9 Oct 1994 19:47:30 GMT |

The grammar (fixed up by Rene Dekker (dekker@dutiag.twi.tudelft.nl)):

*> s -> c | s t c*

*> c -> V n p*

*> a -> A | C o*

*> o -> /* empty */ | A*

*> t -> T | D | a u*

*> u -> /* empty */ | T*

*> n -> /* empty */ | L b | q*

*> q -> N | q a N*

*> b -> /* empty */ | B q*

*> p -> /* empty */ | P N*

was noted to be LR(2), and it was asked if it could be converted to LR(1)

or LALR(1).

The question is a little vague because what you really want is an LALR(1)

*parser* not LALR(1) *recognizer*. There's a big difference there. In fact

its possible to have a grammar that has a finite state recognizer, but an

inherently context free parser. In fact in this case, the language above is

given by the regular expression:

c (t c)*

where

c = V [L | [L B] N (a N)*] [P N]

t = T | D | a [T]

a = A | C [A]

But might not even have a finite state parser (but it does and we'll prove it).

The initial question has to be answered by making the output actions

explicit. It's only then that one can determine the complexity of the

language. Even there, it all depends on what actions are done where. If

no actions are done, for instance, you have a recognizer and the language is

just regular.

To specify languages with explicit actions, you need two alphabets:

X = input alphabet = { V, A, C, T, D, L, N, B, P }

Y = output alphabet = { y0, y1, y2, ... }

It is axiomatically asserted that

x y = y x for all x in X, y in Y.

Given this, the set

F = { w_in w_out: w_in in X*, w_out in Y*,

w_out is a valid output action for w_in }

defines the parser's language, and can be specified by a context free

grammar. A parse of w_in, then, represents the task of finding all the

possible output words, w_out such that w_in w_out is in the set F. An

unambiguous language is one where there is only one w_out corresponding

to each w_in.

The set of languages specifiable by these means are one and the same as the

Simple Syntax Directed Translation languages.

Let's take the actions to stand for the production rules, e.g.

y0: s -> c, y1: s -> s t c, etc.

then the language can be given by the following context free grammar (listed

as a series of equations):

s = c y0 | s t c y1

c = V n p y2

a = A y3 | C o y4

o = y5 | A y6

t = T y7 | D y8 | a u y9

u = y10 | T y11

n = y12 | L b y13 | q y14

q = N y15 | q a N y16

b = y17 | B q y18

p = y19 | P N y20

You can eliminate the recursions and make some substitutions to yield:

o = y5 | A y6,

p = y19 | P N y20,

u = y10 | T y11,

a = A y3 | C o y4,

q = N y15 (a N y16)*,

t = T y7 | D y8 | a u y9,

b = y17 | B q y18,

n = y12 | L b y13 | q y14,

c = V n p y2,

s = c y0 (t c y1)*

This is non-recursive and so it specifies a regular language and thus a

(possibly non-deterministic) finite state machine. With the help of my

RE -> DFA converter (a copy of it and related software is at iecc.com

in /pub/files/regx.tar.gz last I checked), after back-substituting all the

states representing transitions on output symbols, the following results (Q1

is the start state):

Q1 = V Q2

Q2 = L Q3 | N y15 Q8 |

y12 (P Q9 | y19 y1 y0 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q3 = B Q6 | y17 y13 (P Q9 |

y19 y1 y0 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q6 = N y15 Q16

Q8 = A y3 Q17 | C Q13 |

y14 (P Q9 | y19 y1 y0 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q9 = N y20 y2 y0 Q20

Q13 = A y6 y4 Q17 | y5 y4 N y16 Q8

Q16 = A y3 Q28 | C Q22 |

y18 y13 (P Q9 | y19 y1 y0 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q17 = N y16 Q8

Q20 = 1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34

Q22 = A y6 y4 Q28 | y5 y4 N y16 Q16

Q25 = A y6 y4 Q31 | y5 y4 T y11 y9 Q34 | y5 y4 y10 y9 V Q38

Q28 = N y16 Q16

Q31 = T y11 y9 Q34 | y10 y9 V Q38

Q34 = V Q38

Q38 = L Q39 | N y15 Q44 | y12 (P Q45 |

y19 y2 y1 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q39 = B Q42 | y17 y13 (P Q45 |

y19 y2 y1 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q42 = N y15 Q52

Q44 = A y3 Q53 | C Q49 |

y14 (P Q45 | y19 y2 y1 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q45 = N y20 y2 y1 Q20

Q49 = A y6 y4 Q53 | y5 y4 N y16 Q44

Q52 = A Q56 | C Q57 |

y18 y13 (P Q45 | y19 y2 y1 (1 | A y3 Q31 | C Q25 | D y8 Q34 | T y7 Q34))

Q53 = N y16 Q44

Q56 = y3 N y16 Q52

Q57 = A y6 y4 Q59 | y5 y4 N y16 Q52

Q59 = N y16 Q52

Factor out the input symbols (this is where the commutativity axtiom comes

into play) and use the notation:

{n1-n2-...-nk} for yn1 yn2 ... ynk

Then the result is the following:

Q1 = V Q2

Q2 = L Q3 | N {15} Q8 | P {12} Q9 | {12-19-1-0} | A {12-19-1-0-3} Q31 |

C {12-19-1-0} Q25 | D {12-19-1-0-8} Q34 | T {12-19-1-0-7} Q34

Q3 = B Q6 | P {17-13} Q9 | {17-13-19-1-0} | A {17-13-19-1-0-3} Q31 |

C {17-13-19-1-0} Q25 | D {17-13-19-1-0-8} Q34 | T {17-13-19-1-0-7} Q34

Q6 = N {15} Q16

Q8 = A ({3} Q17 | {14-19-1-0-3} Q31) | P {14} Q9 | {14-19-1-0} |

C (Q13 | {14-19-1-0} Q25) | D {14-19-1-0-8} Q34 | T {14-19-1-0-7} Q34

Q9 = N {20-2-0} Q20

Q13 = A {6-4} Q17 | N {5-4-16} Q8

Q16 = A ({3} Q28 | {18-13-19-1-0-3} Q31) | P {18-13} Q9 | {18-13-19-1-0} |

C (Q22 | {18-13-19-1-0} Q25) | D {18-13-19-1-0-8} Q34 |

T {18-13-19-1-0-7} Q34

Q17 = N {16} Q8

Q20 = 1 | A {3} Q31 | C Q25 | D {8} Q34 | T {7} Q34

Q22 = A {6-4} Q28 | N {5-4-16} Q16

Q25 = A {6-4} Q31 | T {5-4-11-9} Q34 | V {5-4-10-9} Q38

Q28 = N {16} Q16

Q31 = T {11-9} Q34 | V {10-9} Q38

Q34 = V Q38

Q38 = L Q39 | N {15} Q44 | P {12} Q45 | {12-19-2-1} | A {12-19-2-1-3} Q31 |

C {12-19-2-1} Q25 | D {12-19-2-1-8} Q34 | T {12-19-2-1-7} Q34

Q39 = B Q42 | P {17-13} Q45 | {17-13-19-2-1} | A {17-13-19-2-1-3} Q31 |

C {17-13-19-2-1} Q25 | D {17-13-19-2-1-8} Q34 | T {17-13-19-2-1-7} Q34

Q42 = N {15} Q52

Q44 = A ({3} Q53 | {14-19-2-1-3} Q31) | C (Q49 | {14-19-2-1} Q25) |

P {14} Q45 | {14-19-2-1} | D {14-19-2-1-8} Q34 | T {14-19-2-1-7} Q34

Q45 = N {20-2-1} Q20

Q49 = A {6-4} Q53 | N {5-4-16} Q44

Q52 = A (Q56 | {18-13-19-2-1-3} Q31) | C (Q57 | {18-13-19-2-1} Q25) |

P {18-13} Q45 | {18-13-19-2-1} | D {18-13-19-2-1-8} Q34 |

T {18-13-19-2-1-7} Q34

Q53 = N {16} Q44

Q56 = N {3-16} Q52

Q57 = A {6-4} Q59 | N {5-4-16} Q52

Q59 = N {16} Q52

The only apparent non-determinisms are the transitions on symbols A

and C in states Q8, Q16, Q44 and Q52.

Q8 = A ({3} Q17 | {14-19-1-0-3} Q31) | C (Q13 | {14-19-1-0} Q25) |...

Q16 = A ({3} Q28 | {18-13-19-1-0-3} Q31) | C (Q22 | {18-13-19-1-0} Q25) |...

Q44 = A ({3} Q53 | {14-19-2-1-3} Q31) | C (Q49 | {14-19-2-1} Q25) |...

Q52 = A (Q56 | {18-13-19-2-1-3} Q31) | C (Q57 | {18-13-19-2-1} Q25) |...

The input symbols for Q31 are T and V, and for Q17, Q28, Q53, and Q56 is

N. So transitions on A are deterministic here. Similarily, the inputs for

Q25 are A, T and V, and for states Q13, Q22, Q49 and Q57 are A and N. This

time there is an apparent non-determinism on a transition on symbol A.

Factoring out the A from each of the 4 subexpressions results in:

Q13 | {14-19-1-0} Q25 = A ({6-4} Q17 | {14-19-1-0-6-4} Q31) | ...

Q22 | {18-13-19-1-0} Q25 = A ({6-4} Q28 | {18-13-19-1-0-6-4} Q31) | ...

Q49 | {14-19-2-1} Q25 = A ({6-4} Q53 | {14-19-2-1-6-4} Q31) | ...

Q57 | {18-13-19-2-1} Q25 = A ({6-4} Q59 | {18-13-19-2-1-6-4} Q31) | ...

Since Q17, Q28, Q53 and Q59 have transitions only on N, and Q31 only on

T and V, everything is deterministic. So the whole finite state machine is

deterministic.

The two layers of factorings done here represent two-lookaheads, so the

input grammar is LR(2). Since the syntax reduces to a form suitable for

a deterministic finite state machine the *parser* language is regular,

not just the *recognizer* language.

There's a lot of redundancy engendered by the repeated substitutions of

subexpressions, but there's ways to deal with that up front...

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.