8 Nov 2001 01:07:49 -0500

Related articles |
---|

table compression rboland@unb.ca (Ralph Boland) (2001-11-04) |

Re: table compression Olivier.Ridoux@irisa.fr (Olivier Ridoux) (2001-11-08) |

Re: table compression hannah@schlund.de (2001-11-08) |

Re: table compression heng@Ag.arizona.edu (Heng Yuan) (2001-11-08) |

Re: table compression mickunas@cs.uiuc.edu (Dennis Mickunas) (2001-11-08) |

Table compression leonardo@dcc.ufmg.br (Leonardo Teixeira Passos) (2005-09-27) |

Re: Table compression anton@mips.complang.tuwien.ac.at (2005-09-30) |

Re: Table compression hannah@schlund.de (2005-09-30) |

Re: Table compression cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-30) |

Re: Table compression Peter_Flass@Yahoo.com (Peter Flass) (2005-10-02) |

[3 later articles] |

From: | Heng Yuan <heng@Ag.arizona.edu> |

Newsgroups: | comp.compilers |

Date: | 8 Nov 2001 01:07:49 -0500 |

Organization: | The University of Arizona |

References: | 01-11-006 |

Keywords: | parse |

Posted-Date: | 08 Nov 2001 01:07:46 EST |

I happened to look into the table compression when I was started writing

YooLex (a Flex-like tool, yoolex.sourceforge.net). Lex and parser tables

are about the same, so I will use Flex as the example.

Flex implements a compression algorithm that was described in

the following paper:

P.Dencker, et al,

"Optimization of parser tables for portable compilers,"

ACM Transaction on Programming Languages and Systems, 6(4):546-572,

October 1984.

http://citeseer.nj.nec.com/context/27540/0

There are three components to it.

1. Equivalent class. This is used to reduce the character set. This is

more useful compression Lex tables due to large character set size. EC is

also used to in error states transitions to find identical columns.

Dunno about the LR tables.

2. Block fitting. Basically, find the most common state in a row. If it

is repeated quite frequently, then the state is compressable. Then one

finds the block size and holes. For example, a row containing

5 5 5 4 4 5 3 4 5 5 5

would have the repeated state 5, block size of 5 (which is 4 4 5 3 4),

and a hole of size 1 at position 6. If you go through all the states,

then you could try to fit them together. Note, that is NP-complete.

So, one could use some simpler approach to it. Flex, if I interpreted

the code correctly, basically layout all the states containing holes

first tightly packed one next to the other, then fill the holes with

blocks that do not contain holes. This approach generally works well

since there will be much more fillers than the holes (YooLex uses a

variant trying to fit them better, but the speed is slower).

However, if you just implement the above, the resulting table is only 40

or 30% of the original Equivalent Class table. So there are additional

things to take care of.

3. If you look at the Equivalent Class table (for Flex, use -Ce -Cf to

generate the table), you will notice that there are lots of negative

values (the values themselves are not important; the negative sign is.

In YooLex, I just give them 0 instead of negative #). Usually, these

negative values tend to occur on one side of the table a lot and thus

create very big holes. This is not good since the block fitting approach

above does not like holes. Also, there tends to be a lot of error states

in the table. If one could somehow remove the error states (assuming that

they are just the same as the next most repeated value in the table), one

could suddently compress the table >>10 times (depending the # of eqvalent

classes and # of error states). The problem is how to compress the error

matrix. The paper listed above used the following stepwise approach:

a. negative the matrix (since error matrix is not sparse, then

the negative of it must be).

b. compress the similar rows and columns (i.e. if there are

several rows and columns having the same value, just store

one copy and use some other scheme to index them. Flex uses

yy_meta to index the columns, the row eqvalent indexing is not

necesssary if tricks are used).

c. instead of storing the error matrix table as binary numbers

(since the matrix is binary), we could use the same procedure

in step2 to compress it. (Using the binary number don't

necessarily buy us some space, and it has some other things

must be taken cared of).

YooLex currently has the first 2 steps implemented and I am still writing

the code for the third part. So until I get it fully implemented, my

intepretation could be false. Go read the paper first.

Good Luck.

Heng Yuan

On 4 Nov 2001, Ralph Boland wrote:

*> I've been searching the literature for methods of compressing LR(1)*

*> (or LALR(1)) parse tables but I haven't found very much.*

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.