Re: Possible bug in lex with trailing context expressions?

vern@daffy.ee.lbl.gov (Vern Paxson)
Wed, 19 Jan 1994 20:13:55 GMT

          From comp.compilers

Related articles
Possible bug in lex with trailing context expressions? greyham@research.canon.oz.au (1994-01-19)
Re: Possible bug in lex with trailing context expressions? vern@daffy.ee.lbl.gov (1994-01-19)
| List of all articles for this month |

Newsgroups: comp.compilers
From: vern@daffy.ee.lbl.gov (Vern Paxson)
Keywords: lex, flex, errors
Organization: Lawrence Berkeley Laboratory, Berkeley CA
References: 94-01-066
Date: Wed, 19 Jan 1994 20:13:55 GMT

greyham@research.canon.oz.au (Graham Stoney) writes:
> While attempting to construct some lex rules to extract the contents of C
> style comments, I appear to have found a problem/bug with lex regarding
> trailing context expressions.
>
> <INITIAL>"/*""*"+/[^/] { putchar('`'); BEGIN COMMENT; }


The algorithm given in the Dragon Book (Aho, Sethi & Ullman) for matching
trailing context contains a bug. The algorithm searches for the *last*
"crossing" between the main pattern and the trailing context and declares
that the trailing context started just after there. With a pattern like


xy*/y


(which is basically what you have in your comment rule), on the input "xy"
the scanner was last in the "crossing" state after scanning "xy", so the
algorithm decides that the trailing context starts after that point, i.e.,
it matches nothing. Note that after scanning "x" the scanner already was
in a "crossing" state, and in this case that was the correct state to
identify as ending the main part of the match. But in general the state
is neither the first crossing nor the last, and the problem of getting the
trailing context straight is almost the same as matching subpatterns
("\(...\)"), which as far as I know can't be done in linear time (someone
more knowledgeable feel free to correct me here, as it would be nice to
add subpatterns to flex). Actually, I think trailing context can be done
correctly and in linear time by constructing the "reverse" DFA and running
it to find the first point that it enters a crossing state coincident with
the original DFA entering a crossing state; but this is a big hassle.


Flex deals with this problem in two ways. First, if either the main rule
or the trailing context has a fixed size, as is the case in the example
above, then it's simple to figure out where the trailing context must
start. If neither does, it inspects each DFA state to see whether it is
both a crossing state and includes NFA states from the trailing context;
if so, the trailing context is reported as "dangerous" and will not be
properly matched. If the main rule and the trailing context both have
variable size but the combination is not "dangerous", then flex uses the
AS&U algorithm.


>One workaround is to include the trailing context in the main expression, and
>use yyless() to push it back again; this works OK, but should be unnecessary.


That's essentially what flex does automatically for you if the main rule
or the trailing context has a fixed size.


By the way, the flexdoc man page gives an example of a set of rules for
quickly matching C comments without using trailing context.


Vern


Vern Paxson vern@ee.lbl.gov
Information and Computing Sciences ucbvax!ee.lbl.gov!vern
Lawrence Berkeley Laboratory (510) 486-7504
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.