Re: How to do this odd kind of regex match?

"Martin Ward" <Martin.Ward@durham.ac.uk>
21 Jul 2002 02:08:14 -0400

          From comp.compilers

Related articles
How to do this odd kind of regex match? dot@dotat.at (Tony Finch) (2002-07-15)
Re: How to do this odd kind of regex match? michaelparker@earthlink.net (Michael Parker) (2002-07-21)
Re: How to do this odd kind of regex match? joachim_d@gmx.de (Joachim Durchholz) (2002-07-21)
Re: How to do this odd kind of regex match? Martin.Ward@durham.ac.uk (Martin Ward) (2002-07-21)
Re: How to do this odd kind of regex match? simon.cozens@computing-services.oxford.ac.uk (Simon Cozens) (2002-07-24)
| List of all articles for this month |

From: "Martin Ward" <Martin.Ward@durham.ac.uk>
Newsgroups: comp.compilers
Date: 21 Jul 2002 02:08:14 -0400
Organization: Compilers Central
Keywords: lex
Posted-Date: 21 Jul 2002 02:08:14 EDT

"Tony Finch" <dot@dotat.at> writes:
> I'd also like to be able to match several regexes against the same
> text in parallel,
...


> (The aim is to speed up heuristic spam detection such as SpamAssassin.)


If you are matching a text against a huge number of regexps,
most of which contain words or phrases, then you might get
more benefit from preprocessing the text. Build a hash table
with the locations of all the 2, 3, 4 (or more) letter sequences.
Then, to match against a regexp containing the word "porn"
(say), you look up "porn" in the table and get the list of character
offsets of locations of that 4 character string in the text.


Martin


Martin.Ward@durham.ac.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.