Possible bug in lex with trailing context expressions?

greyham@research.canon.oz.au (Graham Stoney)
Wed, 19 Jan 1994 01:56:27 GMT

          From comp.compilers

Related articles
Possible bug in lex with trailing context expressions? greyham@research.canon.oz.au (1994-01-19)
Re: Possible bug in lex with trailing context expressions? vern@daffy.ee.lbl.gov (1994-01-19)
| List of all articles for this month |

Newsgroups: comp.compilers
From: greyham@research.canon.oz.au (Graham Stoney)
Summary: lex appears to mishandle trailing context rules.
Keywords: lex, flex, errors, comment
Organization: Canon Information Systems Research Australia
Date: Wed, 19 Jan 1994 01:56:27 GMT

While attempting to construct some lex rules to extract the contents of C
style comments, I appear to have found a problem/bug with lex regarding
trailing context expressions. The following lexer is intended to extract the
contents of any C style comments in its input, sending them to the standard
output with leading and trailing *'s and spaces stripped:


/* lexbug.l: Rudimentary comment-contents lexer. */
/* This appears to show a bug with lex matching trailing context correctly. */
WS [ \t]


%S COMMENT


%%
<INITIAL>.|\n ;
<INITIAL>"/*""*"*{WS}+ { putchar('`'); BEGIN COMMENT; }
<INITIAL>"/*""*"+/[^/] { putchar('`'); BEGIN COMMENT; }


<COMMENT>{WS}*"*"+"/" { puts("'"); BEGIN INITIAL; }
<COMMENT>[^*\n \t]* |
<COMMENT>{WS}* |
<COMMENT>"*"+[^*/\n]* ECHO;
<COMMENT>{WS}*\n putchar('\n');


The problem is that the trailing context in the third rule does not appear to
match correctly when presented with degenerate input like `/***/': I expected
that it would match this as `/**', entering the COMMENT state and leaving `*/'
to match the first COMMENT rule. Thus, this input would lex correctly and
output the empty string `'. Instead, lex seems to ignore the prescence of the
trailing context `[^/]' in the rule altogether, and matches `/***'; which
fouls things up when it enters the COMMENT state since it will not recognise
the `/' as being the end of the comment.


flex version 2.3 handles the situation as I'd expected, making me think that
perhaps there is a bug in lex that I've stumbled upon.


One workaround is to include the trailing context in the main expression, and
use yyless() to push it back again; this works OK, but should be unnecessary.
It's like this:


/* lexok.l: Rudimentary comment-contents lexer. */
/* This one works around a lex bug regarding trailing context matching. */
WS [ \t]


%S COMMENT


%%
<INITIAL>.|\n ;
<INITIAL>"/*""*"*{WS}+ { putchar('`'); BEGIN COMMENT; }
<INITIAL>"/*""*"+[^/] { yyless(yyleng-1); putchar('`'); BEGIN COMMENT; }


<COMMENT>{WS}*"*"+"/" { puts("'"); BEGIN INITIAL; }
<COMMENT>[^*\n \t]* |
<COMMENT>{WS}* |
<COMMENT>"*"+[^*/\n]* ECHO;
<COMMENT>{WS}*\n putchar('\n');


Does anyone know why lex might be acting this way? Judging from the paper
"Lex - A Lexical Analyser Generator" by M. E. Lesk and E. Schmidt, the trailing
context method is the preferred form and ought to work. Can anyone shed some
light on what I might be doing wrong?


And finally, here's a transcript of running the above analysers:


greyham@jaco% lex lexbug.l; cc lex.yy.c -ll
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`/ # thinks it's still in COMMENT state!
^D
greyham@jaco% flex lexbug.l; cc lex.yy.c -lfl
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`'
^D
greyham@jaco% lex lexok.l; cc lex.yy.c -ll
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`'
^D
greyham@jaco% flex lexok.l; cc lex.yy.c -lfl
greyham@jaco% a.out
/********* hi mom ********/
`hi mom'
/*************************/
`'
^D


Thanks,
Graham
--
Graham Stoney, Hardware/Software Engineer
Canon Information Systems Research Australia
Ph: + 61 2 805 2909 Fax: + 61 2 805 2929
[There are dozens of bugs in AT&T lex, so many that I've rarely seen a
usefully complex lexer that worked with it. Since flex is better than
AT&T lex in every way, and is legally unencumbered (no, it's not
copylefted), I see no reason ever to use lex. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.