Re: Trie algorithms?

tmcd@crl.com (Timothy A. McDaniel)
6 May 1996 23:19:05 -0400

          From comp.compilers

Related articles
Trie algorithms? mcdaniel@portcullis.cpm.com (Tim McDaniel) (1996-04-29)
Re: Trie algorithms? grosch@cocolab.sub.com (1996-05-04)
Re: Trie algorithms? tmcd@crl.com (1996-05-06)
| List of all articles for this month |

From: tmcd@crl.com (Timothy A. McDaniel)
Newsgroups: comp.compilers
Date: 6 May 1996 23:19:05 -0400
Organization: CRL Network Services (415) 705-6060 [Login: guest]
References: 96-04-152
Keywords: lex

Tim McDaniel <mcdaniel@portcullis.cpm.com> wrote:
>I'm looking for algorithms for quick matching of one of a set of
>fixed strings. ...
>
>[I'd just use flex, following some of the hints in the documentation
>to make lexers work faster. -John]


Oh, dear,, I was oversimplifying and writing badly.


- I'm really not trying to match strings of characters. They're a
sequence of numeric patterns, not guaranteed to be in the range 0..127
or 0.256. The problem is *isomorphic* to string matching, just with
an integer alphabet. The application is rule-based matching, with
rules like
        1 1 7 6 : 23 23 3 33


(To be even more detailed: the LHS numbers are token types as returned
by lex. 1 1 7 6 might be "T_alphastring T_alphastring T_twochar
T_digitstring", and 23 23 3 33 might then represent "CITY CITY
STATE_PROVINCE POSTAL_CODE".)


- They're not a "fixed" set of patterns. Gad, that was bad wording!


There is a control file that has the list of rules to apply to the
input. That control file is parsed once and all control structures
are set up. Then the rules are applied to each record of the input.
They're only "fixed" in the sense that they don't change while reading
the main input. The rules may change from run to run. Even if flex
could be modified to allow a larger "alphabet", we can't depend on our
customers having it at their sites.


So, with this correction (and I am sorry!), what matching algorithms
would look reasonable? Trie structures look nice, and the property
that an unmatched alphabet character is passed thru unchanged is
actually what we want. However, the algorithms from the 2nd
ed. dragon book don't consider output.


I've received several replies already -- many thanks! Any other
ideas? I would like something easy to find and implement, as I don't
have much time at all to spend on this project.


--
                                                          Tim McDaniel
                                                Reply-To: tmcd@crl.com
(Work is mcdaniel@cpm.com.)
                            Never use mcdaniel@mcdaniel.dallas.tx.us.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.