4 Mar 1999 12:11:54 -0500

Related articles |
---|

Compress the finite state machine felixmish@usa.net (Felix Mish) (1999-03-02) |

Re: Compress the finite state machine cfc@world.std.com (Chris F Clark) (1999-03-04) |

Re: Compress the finite state machine torbenm@diku.dk (Torben Mogensen) (1999-03-04) |

Re: Compress the finite state machine jes6@eecs.lehigh.edu (James E. Stine) (1999-03-04) |

Re: Compress the finite state machine allanmac@blueprint.com (Allan MacKinnon) (1999-03-04) |

Re: Compress the finite state machine hunk@alpha1.csd.uwm.edu (1999-03-04) |

Re: Compress the finite state machine stans@lucent.com (Sicco Tans) (1999-03-10) |

Re: Compress the finite state machine torbenm@diku.dk (Torben Mogensen) (1999-03-22) |

Re: Compress the finite state machine sassth@unx.sas.com (Steven Hand) (1999-03-22) |

From: | Torben Mogensen <torbenm@diku.dk> |

Newsgroups: | comp.compilers |

Date: | 4 Mar 1999 12:11:54 -0500 |

Organization: | Compilers Central |

References: | 99-03-010 |

Keywords: | DFA |

Felix Mish <felixmish@usa.net> wrote:

*>I am evaluating the ways to compress the finite state machine.*

*>Is there anyone be kind enough to provide some possible solutions?*

You don't say whether your machine is deterministic or

nondeterministic. I'll assume the first.

The first step is, obviously, to minimize the number of states in the

DFA. You can find a method for this in Aho, Sethi and Ullman's

compiler book, though the version described there isn't particularly

efficient.

The next step is to reduce the space used by each state. There are

several ways to do this.

One is to divide the alphabet into equivalence classes: If it for all

states is the case that the transition made for two characters is the

same, these are equivalent. For example, it is often the case that all

digits are equivalent in DFA's used for lexical analysis, and often

most letters have the same property (some may be used for keywords,

though). When you have found the equivalence classes, make a single

global table that maps a character to its equivalence class and make

the state transition tables use the equivalence class numbers

instead. If you can reduce 256 different characters to, say, 50

equivalence classes that is a factor 5 savings.

Next, you can try to make several states share the same row in the

table. Two states may have transitions on disjoint subsets of the

alphabet. In this case, they can share a row in the transition table

if you also for each state provide a bitmap that masks out the

irrelevant parts of the row. Since a bitmap takes up far less space

than a row, you can save quite a bit here. You can relax the sharing

criterium a bit: Two states need not have disjoint domains. It is O.K

that they both have a defined transition on a character if the

trasitionsa re identical. Finding an optimal sharing of rows between

states is equivalent to graph colouring: The states form nodes in an

undirected graph. An edge is drawn between two states if there exist a

character such that they both have defined transitions on this

character and these go to different states. Find the minimal number of

colours for the nodes such that states connected by edges have

different colours. Graph colouring is also used for register

allocation and the same heuristics should work fine for row sharing.

If these measures are insufficient, you can try alphabet splitting:

Replace each character from the alphabet by two adjacent characters

from a smaller set. If, for example, your alphabet has 256 characters,

you can split each character into two characters from an alphabet of

16 characters. When you in a DFA replace each transition by two

sequential transitions on the smaller characters, you might get an

NFA, as symbols that are distinct in the original alphabet may not be

distinguished by the first character in their encodings. Hence, you

must convert this NFA to a DFA. If the original DFA was constructed

from a regular expression or NFA, it is easier to modify these before

conversion to a DFA instead of modifying the DFA and re-do NFA to DFA

conversion. While the number of states in your DFA will (on average)

double by this transformation, your alphabet will be far less than

half the size (it may, e.g., go from 256 to 16 characters). Hence,

there is a substantial space saving.

All three methods can be combined: First alphabet splitting, then

equivalence classing and finally row sharing. However, the combined

time overhead may be substantial.

Torben Mogensen (torbenm@diku.dk)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.