Sun, 29 Dec 2019 20:56:21 -0800 (PST)

Related articles |
---|

[3 earlier articles] |

Re: How make multifinished DFA for merged regexps? borucki.andrzej@gmail.com (Andy) (2019-12-20) |

Re: How make multifinished DFA for merged regexps? 493-878-3164@kylheku.com (Kaz Kylheku) (2019-12-21) |

How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-23) |

Re: How make multifinished DFA for merged regexps? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-12-24) |

Re: How make multifinished DFA for merged regexps? matt.timmermans@gmail.com (Matt Timmermans) (2019-12-23) |

Re: How make multifinished DFA for merged regexps? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-12-24) |

Re: How make multifinished DFA for merged regexps? rockbrentwood@gmail.com (2019-12-29) |

From: | rockbrentwood@gmail.com |

Newsgroups: | comp.compilers |

Date: | Sun, 29 Dec 2019 20:56:21 -0800 (PST) |

Organization: | Compilers Central |

References: | 19-12-005 |

Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="7825"; mail-complaints-to="abuse@iecc.com" |

Keywords: | lex, DFA |

Posted-Date: | 30 Dec 2019 11:30:03 EST |

In-Reply-To: | 19-12-005 |

On Thursday, December 19, 2019 at 10:01:24 PM UTC-6, Andy wrote:

*> I can create DFA direct from regexp.*

*> But for language lexer I must have DFA for couple regexp.*

*> One solution is crating DFA with multi finished states.*

*> For example*

*> r0 = ab*

*> r1 = ac*

*>*

*> | 0 | 1*

*> a | 1 |*

*> b | | 2(F)*

*> c | | 3(F)*

*>*

*> How to check if r0 and r1 are disjoint?*

In a day or so, I'll put back up on GitHub our regex utilities that had once

been up on the comp.compilers archive. The "DFA" program can process boolean

operations - including intersection and relative compliment. (The "NFA"

program produces efficient linear-sized near-deterministic FA, "REX" is like

GREP except it can process boolean combinations too, and "FSC" is a "finite

state classifier"). I'll post a link when it's up on GitHub.

Denote intersection by &. The regular expressions ab, ac are the least

fixed-point solutions to the following systems

[0] = ab >= a[1]

[1] = b >= b[2]

[2] = 1

[3] = ac >= a[4]

[4] = c >= c[5]

[5] = 1

The system for the intersection is obtained by distributivity:

[0]&[3] >= a([1]&[4])

[1]&[4] >= (b[2] | c0) & (b0 | c[2]) = b([2]&0) | c(0&[2]) = b0 | c0 = 0

The least fixed-point solution is [1]&[4] = 0, [0]&[3] = a0 = 0.

So, there's 0 intersection.

A more interesting example with relative compliment "-":

(a|b)* - (a|b)*(aa|bb)(a|b)*

The first term (a|b)* has the following right-linear system

[0] = (a|b)* = 1 | (a|b)(a|b)* = 1 | a(a|b)* | b(a|b)* >= 1 | a[0] | b[0]

The second term (a|b)*(aa|bb)(a|b)* has the following:

[1] = (a|b)*(aa|bb)(a|b)* = (aa|bb)(a|b)* | a[1] | b[1]

= a(a(a|b)* | [1]) | b(b(a|b)* | [1])

>= a[2] | b[3]

[2] = a(a|b)* | [1] = a(a|b)* | 1 | a[1] | b[1] = 1 | a((a|b)* | [1]) | b[1]

>= 1 | a[4] | b[1]

[3] = b(a|b)* | [1] = b(a|b)* | 1 | a[1] | b[1] = 1 | a[1] | b((a|b)* | [1])

>= 1 | a[1] | b[4]

[4] = (a|b)* | [1] = 1 | (a|b)(a|b)* | 1 | a[1] | b[1]

= 1 | 1 | a((a|b)* | [1]) | b((a|b)* | [1])

>= 1 | a[4] | b[4]

The system for the relative compliment can be found by distributivity:

[0]-[1] >= (1|a[0]|b[0]) - (0|a[2]|b[3])

= 1-0 | a([0]-[2]) | b([0]-[3])

= 1 | a([0]-[2]) | b([0]-[3])

[0]-[2] >= (1-1) | a([0]-[4]) | b([0]-[1]) = a([0]-[4]) | b([0]-[1])

[0]-[3] >= (1-1) | a([0]-[1]) | b([0]-[4]) = a([0]-[1]) | b([0]-[4])

[0]-[4] >= (1-1) | a([0]-[4]) | b([0]-[4]) = a([0]-[4]) | b([0]-[4])

The least fixed-point solution x >= ax | bx is x = 0, so for the last item, it

is [0]-[4] = 0. For the remaining items, the system therefore reduces to

[0]-[1] >= 1 | a([0]-[2]) | b([0]-[3])

[0]-[2] >= a0 | b([0]-[1]) = b([0]-[1])

[0]-[3] >= a([0]-[1]) | b0 = a([0]-[1])

which upon substitution reduces to a single inequality

[0]-[1] >= 1 | ab([0]-[1]) | ba([0]-[1]) = 1 | (ab|ba)([0]-[1]).

The least fixed point solution to x >= 1 | cx is x = c*. Thus, for [0]-[1], it

is

[0]-[1] = (ab|ba)*

Therefore, the relative compliment evaluates to:

(a|b)* - (a|b)*(aa|bb)(a|b)* = (ab|ba)*.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.