Re: Table compression

"Paul Mann" <paul@parsetec.com>
2 Oct 2005 02:53:05 -0400

          From comp.compilers

Related articles
[4 earlier articles]
Re: table compression mickunas@cs.uiuc.edu (Dennis Mickunas) (2001-11-08)
Table compression leonardo@dcc.ufmg.br (Leonardo Teixeira Passos) (2005-09-27)
Re: Table compression anton@mips.complang.tuwien.ac.at (2005-09-30)
Re: Table compression hannah@schlund.de (2005-09-30)
Re: Table compression cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-30)
Re: Table compression Peter_Flass@Yahoo.com (Peter Flass) (2005-10-02)
Re: Table compression paul@parsetec.com (Paul Mann) (2005-10-02)
Re: Table compression cfc@shell01.TheWorld.com (Chris F Clark) (2005-10-03)
Re: Table compression henry@spsystems.net (2005-10-13)
| List of all articles for this month |

From: "Paul Mann" <paul@parsetec.com>
Newsgroups: comp.compilers
Date: 2 Oct 2005 02:53:05 -0400
Organization: Parsetec
References: 05-09-130
Keywords: parse
Posted-Date: 02 Oct 2005 02:53:05 EDT

> I'm looking foward to developing a study on table compression techniques.
> When browsing the Web, I haven't found any recent articles (at least from
> the last decade). Does anyone have a clue as where as to begin?


I found this to be very helpful, but not complete in implementation
details. I had to finish the job when putting it into practice.


"Optimization of Parser Tables for Portable Compilers",
Paper from TOPLAS Oct 1984, by Dencker, Durre and Heuft.


> [Since computer memories have gotten so big, does anyone care about
> table compression any more? Now, a typical Windows program has a
> megabyte of unused libraries that aren't worth stripping out, so
> why waste time with the tables? -John]


If you are writing a parser generator for all languages, including
experimental parsing, such as all XML dialects in one parser table,
you can end up with as many as 30,000 rules in a grammar.


The parser-table size should be computed with the formula:


S x (N+T) x 2 bytes/entry


Where S is the number of states,
and N is the number of nonterminals,
and T is the number of terminals.


For COBOL-85 the numbers would be :


1200 x (450+350) x 2 = 1,920,000 bytes = 1,875 K = 1.83 MB.


This is assuming the use of a matrix style array indexed
by the state number and symbol of the grammar to get
either the next state number (+number) or the reduction to
make (-number). This is for uncompressed parser tables.


This also assumes that no shift-reduce parsing actions are
used. Using these will eliminate about 30% of the states.


LRgen uses shift-reduce actions and has about 900 states. After
compressing the tables, they are 67,621 bytes = 66 K.


The parsers are nearly as fast as using uncompressed tables.
The difference in speed would be unnoticeable.


Some other parser generators may give smaller sizes
depending on which table compression technique is used.


Of course, a list style of parser tables would be much
smaller, but not likely smaller than the compressed matrix
tables. And the list format is slower and speed may
decreases with the increase in size of the grammar as the
number of actions per state increases.


I'm in favor of compressing the parser tables. Some users might
want to have one parser and many parser tables in an application.
One of LRgen's users is doing that with dialects of SQL which are
larger than COBOL.


Paul Mann
http://parsetec.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.