Minding my Ps and Qs

"Peter R. Wilson" <peter.r.wilson@boeing.com>
15 Feb 1999 23:23:46 -0500

          From comp.compilers

Related articles
Minding my Ps and Qs peter.r.wilson@boeing.com (Peter R. Wilson) (1999-02-15)
Re: Minding my Ps and Qs torbenm@diku.dk (Torben Mogensen) (1999-02-16)
Re: Minding my Ps and Qs paul.janssens@skynet.be (JPA) (1999-02-21)
| List of all articles for this month |

From: "Peter R. Wilson" <peter.r.wilson@boeing.com>
Newsgroups: comp.compilers
Date: 15 Feb 1999 23:23:46 -0500
Organization: The Boeing Company
Keywords: parse

I am trying to massage an ambiguous grammar for the EXPRESS-X
language into a form suitable for LL(1) parsing. I use Wirth's syntax
notation (WSN) (i.e., {} means zero or more repetitions and [] means
zero or one repetitions). After some work I got stuck on a term in the
grammar that was of the form:
1 t = { P | Q } P .
That is, sentences consisting of any number of P and/or Q, followed by
P. This is LL(oo) as written. In trying to massage this I made the
assumption that an equivalent production is one that could be used to
generate an identical set of sentences (I have a tool that does sentence
generation from a WSN grammar). After some fiddling around I came up
with what turned out to be a near equivalent, namely
2 t = P | Q { Q } P | t P .
This generates strings of any number of Ps and/or Qs, followed by one or
two Ps. Production 2 is directly left recursive, but the compiler books
I have read say that you can eliminate left recursion, as per the
following (if I have written it correctly in WSN).

        Given a left recursive production
a = b | a c .
Then an equivalent non left recursive grammar is
a = b aa .
aa = [ c aa ] .

        So, I applied this to (2) to produce the equivalent:
3 t = ( P | Q { Q } P ) tt .
4 tt = [ P tt ] .
BUT this does not produce the same set of sentences as does (2)
(basically once a sequence of Qs followed by a P has been generated then
there will be no more Qs --- production (4) generates a string of Ps).
On the other hand, using (5) instead of (4) appears to make (3)
equivalent to (2), where
5 tt = [ P t ] .

        I must have gone wrong somewhere, but I can't figure out where. Can
anyone explain it to me?

        By the way, I think that the equivalent LL(1) production to (1) is:
6 t = ( P | Q { Q } P ) [ t ] .
Anyone willing to confirm this?

        I haven't got around yet to writing an actual parser, but have put
the grammar through a tool that checks if a WSN grammar is LL(1). As I
coded both of the tools, they might be in error somewhere.

Thanks for any help.
Peter W.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.