|Limits of regular languages email@example.com (e0726905) (2010-02-11)|
|Re: Limits of regular languages firstname.lastname@example.org (Matthias-Christian Ott) (2010-02-13)|
|Re: Limits of regular languages email@example.com (Kaz Kylheku) (2010-02-13)|
|Re: Limits of regular languages cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-13)|
|Re: Limits of regular languages firstname.lastname@example.org (Max Hailperin) (2010-02-13)|
|Re: Limits of regular languages email@example.com (e0726905) (2010-02-27)|
|From:||Kaz Kylheku <firstname.lastname@example.org>|
|Date:||Sat, 13 Feb 2010 20:06:01 +0000 (UTC)|
|Organization:||A noiseless patient Spider|
|Posted-Date:||13 Feb 2010 19:31:20 EST|
On 2010-02-11, e0726905 <email@example.com> wrote:
> I have been playing around with regular expressions for the last few
> weeks, implementing the automaton generation algorithms from the dragon
> book and taking at the existing tools.
> Now a few questions have sprung, that I haven't been able to answer:
> Is there a way implement the usual longest-match behaviour of a scanner
> with a DFA without any backtracking mechanisms, meaning is that
> behaviour strictly regualr or not?
You mean without recording the last good match, and reverting to that
piece of memory when reaching a state transition impasse?
Okay, first of all, it should be obvious that longest extraction
is not set membership testing!
The longest extraction scanner does something other than report ``does
this particular input string S lie within the language L''. Namely,
the scanner reports that ``a length N prefix of the input sequence
lies in the language L (and there is no such prefix longer than N)''.
Therefore, the scanner needs to be able to count, and it needs to be
able to count indefinitely high, since in general there is no upper
bound on N, which means that it's not mathematically a finite state
(In practice, scanners limit token length; but so do parsers
limit recursion depth too.)
The longest-match behavior is achieved in practice by counting the
input, and at every acceptance state, recording a copy of the counter
as a ``longest match seen so far'' variable. When the matching
terminates (there is no more input, or the machine indicates that no
transitions are possible), the value of that variable is reported.
> I also have two rather unrelated questions that have been bugging me for
> quite some time:
> Is every regular language LL(1)?
> What class of language do regexes with backreferences fall into?
> Membership is NP-Complete so they are not in CFG (or are they? assuming
In backreferences, prior input defines the grammar which matches new
input. So this is quite a different animal from the context free
grammar, which is purely a static structure, not influenced by input.
Return to the
Search the comp.compilers archives again.