Re: Regular expression string searching & matching

Ben Hanson <jamin.hanson@googlemail.com>
Tue, 13 Mar 2018 11:30:30 -0700 (PDT)

          From comp.compilers

Related articles
[4 earlier articles]
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-10)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-10)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-11)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-12)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-12)
Re: Regular expression string searching & matching DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2018-03-13)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-13)
Re: Regular expression string searching & matching jamin.hanson@googlemail.com (Ben Hanson) (2018-03-13)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-17)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-18)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-20)
Re: Regular expression string searching & matching clint.olsen@gmail.com (Clint O) (2018-03-22)
| List of all articles for this month |

From: Ben Hanson <jamin.hanson@googlemail.com>
Newsgroups: comp.compilers
Date: Tue, 13 Mar 2018 11:30:30 -0700 (PDT)
Organization: Compilers Central
References: 18-03-016 18-03-032 18-03-034 18-03-035 18-03-041 18-03-045
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="60596"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 13 Mar 2018 15:36:30 EDT

> It also looks like you are running a DFA minimizer (like Hopcroft) on your
> result since I am not producing a minimal DFA. That also may help me figure
> out if I'm producing the right automaton because they'd match...


Actually, although I have a minimising routine I'm not applying it here.


I intersect every character class in the regex to produce non-overlapping sets
and then these are grouped in equivalence sets for each state in the DFA. This
is the part that I didn't find in the Dragon Book (or any other compiler book
I looked in) and it was a message on this group from Vern Paxon that tipped me
off: https://compilers.iecc.com/comparch/article/94-10-091


(I only paid attention to the equivalence classes bit by the way!)


I intersect the equivalence sets as I build the DFA and I have found with this
technique you kind of have to go out of your way to get a non minimal DFA by
default. I've imagined an even more rigorous approach where character classes
are intersected as they are added to the syntax tree up front, but I've not
got around to trying to implement it.


I'd be interested to see your DFA output once you've cleaned it up a bit.
Using derivatives is interesting too - I've seen it discussed but never tried
to get into that.


Regards,


Ben



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.