Re: Limits of regular languages

Kaz Kylheku <>
Sat, 13 Feb 2010 20:06:01 +0000 (UTC)

          From comp.compilers

Related articles
Limits of regular languages (e0726905) (2010-02-11)
Re: Limits of regular languages (Matthias-Christian Ott) (2010-02-13)
Re: Limits of regular languages (Kaz Kylheku) (2010-02-13)
Re: Limits of regular languages (Chris F Clark) (2010-02-13)
Re: Limits of regular languages (Max Hailperin) (2010-02-13)
Re: Limits of regular languages (e0726905) (2010-02-27)
| List of all articles for this month |

From: Kaz Kylheku <>
Newsgroups: comp.compilers
Date: Sat, 13 Feb 2010 20:06:01 +0000 (UTC)
Organization: A noiseless patient Spider
References: 10-02-052
Keywords: lex, DFA
Posted-Date: 13 Feb 2010 19:31:20 EST

On 2010-02-11, e0726905 <> wrote:
> Hi,
> I have been playing around with regular expressions for the last few
> weeks, implementing the automaton generation algorithms from the dragon
> book and taking at the existing tools.
> Now a few questions have sprung, that I haven't been able to answer:
> Is there a way implement the usual longest-match behaviour of a scanner
> with a DFA without any backtracking mechanisms, meaning is that
> behaviour strictly regualr or not?

You mean without recording the last good match, and reverting to that
piece of memory when reaching a state transition impasse?

Okay, first of all, it should be obvious that longest extraction
is not set membership testing!

The longest extraction scanner does something other than report ``does
this particular input string S lie within the language L''. Namely,
the scanner reports that ``a length N prefix of the input sequence
lies in the language L (and there is no such prefix longer than N)''.
Therefore, the scanner needs to be able to count, and it needs to be
able to count indefinitely high, since in general there is no upper
bound on N, which means that it's not mathematically a finite state

(In practice, scanners limit token length; but so do parsers
limit recursion depth too.)

The longest-match behavior is achieved in practice by counting the
input, and at every acceptance state, recording a copy of the counter
as a ``longest match seen so far'' variable. When the matching
terminates (there is no more input, or the machine indicates that no
transitions are possible), the value of that variable is reported.

> I also have two rather unrelated questions that have been bugging me for
> quite some time:
> Is every regular language LL(1)?


> What class of language do regexes with backreferences fall into?
> Membership is NP-Complete so they are not in CFG (or are they? assuming
> P=NP?).

In backreferences, prior input defines the grammar which matches new
input. So this is quite a different animal from the context free
grammar, which is purely a static structure, not influenced by input.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.