Wed, 15 Jun 2011 12:28:07 GMT

Related articles |
---|

Inverse grep gah@ugcs.caltech.edu (glen herrmannsfeldt) (2011-06-08) |

Re: Inverse grep cfc@shell01.TheWorld.com (Chris F Clark) (2011-06-12) |

Re: Inverse grep dot@dotat.at (Tony Finch) (2011-06-13) |

Re: Inverse grep torbenm@diku.dk (2011-06-14) |

Re: Inverse grep anton@mips.complang.tuwien.ac.at (2011-06-15) |

Re: Inverse grep junyer@gmail.com (Paul Wankadia) (2012-05-28) |

From: | anton@mips.complang.tuwien.ac.at (Anton Ertl) |

Newsgroups: | comp.compilers |

Date: | Wed, 15 Jun 2011 12:28:07 GMT |

Organization: | Institut fuer Computersprachen, Technische Universitaet Wien |

References: | 11-06-015 11-06-022 |

Keywords: | lex, performance |

Posted-Date: | 16 Jun 2011 20:04:01 EDT |

torbenm@diku.dk (Torben Fgidius Mogensen) writes:

*>glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:*

*>*

*>> I suppose this is a strange question, but I was wondering if*

*>> there was ever something like an inverse grep. That is,*

*>> match a string against a file full of regular expressions. ...*

*>[How serious is the explosion problem in practice, now that compilers*

*>no longer have to fit into 64k? We use lex scanners with a thousand*

*>tokens, and spamassassin feeds about 2000 patterns to re2c if you use*

*>the compiled patterns feature. None of them seem to strain our storage*

*>systems. -John]*

Even if it was a problem, the solution has been around a long time

(although I am not aware of an academic paper about it): Generate the

DFA on-demand, only for those transitions that are actually used.

Then the worst-case memory demands are O(n) where n is the length of

the input string (and in the usual case the memory usage grows much

slower).

In tree-parsing automata (i.e., not for regexps) I have had cases

where the size of the statically generated automaton was a problem

<https://www.complang.tuwien.ac.at/anton/lazyburg/gforth9.burg>: We

could not generate the automata for the largest tree grammars in

practical time and even had problems compiling some of the code that

we could generate. OTOH, on-demand generation of automaton states

worked even for the largest tree grammars, and memory consumption was

on the order of 2MB IIRC. The numbers will be different for regexps,

but I expect the effect to be the same: on-demand automata will work

in many cases where static automata take too long to generate or are

too big to run. The paper about this work is

<http://www.complang.tuwien.ac.at/papers/ertl+06pldi.ps.gz>

- anton

--

M. Anton Ertl

anton@mips.complang.tuwien.ac.at

http://www.complang.tuwien.ac.at/anton/

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.