Re: A new? table compression technique for LR parsers

"Paul Mann" <paulmann@thegrid.net>
15 May 1998 22:38:24 -0400

From comp.compilers

Related articles
A new? table compression technique for LR parsers torbenm@diku.dk (Torben Mogensen) (1998-05-07)
Re: A new? table compression technique for LR parsers vmakarov@cygnus.com (Vladimir N. Makarov) (1998-05-12)
*Re: A new? table compression technique for LR parsers paulmann@thegrid.net (Paul Mann)* (1998-05-15)**

| List of all articles for this month |

From:	"Paul Mann" <paulmann@thegrid.net>
Newsgroups:	comp.compilers
Date:	15 May 1998 22:38:24 -0400
Organization:	Call America Internet Services +1 (800) 563-3271
References:	98-05-031
Keywords:	parse

Torben Mogensen wrote in message 98-05-031...

>While toying with LR parsing, I stumbled on a simple idea for table
>compression in LR parsers. A brief search of literature did not reveal
>any previous mention of the methods, so it may be new. If you have
>heard about it before, please tell me.

Very interesting idea. OK, you save space by compressing the parser
matrix, but ... you add space by storing all the productions in the
parser tables. You slow down the apply action for reductions and if
you have and error, and want to do error recovery, now you have a
mess, right?, because you have to backtrack.

The theoretical proof for the correctness of this technique would be
very challenging. I think it might be possible to end up in a state
that does a reduction and checks the symbols on the parse stack and
continues parsing WITHOUT realizing that the wrong reduction was made.

Sometimes the right sides of a grammar rule are IDENTICAL but the head
symbol (nonterminal) is not the same. The nonterminal would not be on
the parse stack and, uh oh, you've got a problem. Not you have to
figure out if your grammar is ambiguous or your compressed parser
matrix has introduced ambiguity into the parser.

I implemented LALR rather than LR(1) as follows.

I used the compression technique from the book, "Compiler
Construction" by Waite and Goos, Springer-Verlag, which does not give
enough informa- tion for complete implementation. So, I also
recommend a paper on Optimization of Parser Tables for Portable
Compilers, by Dencker and 3 others.

You make a bit matrix with a one for a transition and a zero for error
and a zero for a reduction. Most of the states in an LALR parser can
make a default reduction anyway, so I have a separate reduction
matrix, which is small and only necessary for those state that have
more than production being recognized.

So from this original parser matrix with the reductions removed:

                    | id + * ( ) $ | E
                --+------------------------+---
                0 | s3 s2 | 1
                1 | s4 s5 |
                2 | s3 s2 | 6
                3 | |
                4 | s3 s2 | 7
                5 | s3 s2 | 8
                6 | s4 s5 s9 |
                7 | s5 |
                8 | |
                9 | |
                --+------------------------+---

You get this bit matrix:

                    | id + * ( ) $ | E
                --+------------------------+---
                0 | 1 0 0 1 0 0 |
                1 | 0 1 1 0 0 0 |
                2 | 1 0 0 1 0 0 |
                3 | 0 0 0 0 0 0 |
                4 | 1 0 0 1 0 0 |
                5 | 1 0 0 1 0 0 |
                6 | 0 1 1 0 1 0 |
                7 | 0 0 1 0 0 0 |
                8 | 0 0 0 0 0 0 |
                9 | 0 0 0 0 0 0 |
                --+------------------------+---

Then all ther error cells in the parser matrix become "don't care".
Now you can go back to the parser matrix and merge all states with
non-conflicting transitions. You can also merge all columns. You can
also compress the bit matrix the same way, merging all rows that are
similar and then all column that are similar.

You still don't have to store the production symbols in the tables.
You still stop at the first error, and you still have your quick
reductions.

And ... if you want to see how small the tables can be ... you can use
the free demo version of an LALR parser generator from LALR Systems.
There are 18 BNF grammars available for download ranging from small to
huge. It's at: http://www.lalr.com/

Paul Mann
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: A new? table compression technique for LR parsers

"Paul Mann" <paulmann@thegrid.net>15 May 1998 22:38:24 -0400

"Paul Mann" <paulmann@thegrid.net>
15 May 1998 22:38:24 -0400