Re: 2 word token as one in lex

James Carlson <james.d.carlson@sun.com>
23 Jul 2000 16:57:11 -0400

          From comp.compilers

Related articles
2 word token as one in lex makotosu@my-deja.com (2000-07-18)
Re: 2 word token as one in lex troy@bell-labs.com (Troy Cauble) (2000-07-23)
Re: 2 word token as one in lex james.d.carlson@sun.com (James Carlson) (2000-07-23)
Re: 2 word token as one in lex kszabo@nortelnetworks.com (Kevin Szabo) (2000-07-23)
| List of all articles for this month |

From: James Carlson <james.d.carlson@sun.com>
Newsgroups: comp.compilers
Date: 23 Jul 2000 16:57:11 -0400
Organization: Sun Microsystems Inc. - BDC
References: 00-07-034
Keywords: parse

makotosu@my-deja.com writes:
> if the lexer sees UNION and the next token (after any # of whitespaces,
> tabs and newlines) is JOIN it should return UNION_JOIN


Why? The parser should be able to handle this, shouldn't it?


> I want to do this in the lexcial analyzer, not the parser. Is this
> possible? I was thinking of using exclusive states but I could not get
> it working, I did something like


Here's something that's equivalent to what you wrote and does what you
ask (note that it doesn't handle word-ends at all correctly; but
neither did the previous example).


%{
#include <stdio.h>
%}
%%
"UNION" { puts("union-alone"); }
"UNION"[ \t\n]+"JOIN" { puts("union-join"); }
"JOIN" { puts("join-alone"); }
%%
int
main(int argc, char **argv)
{
yyin = stdin;
yylex();
}


--
James Carlson, Internet Engineering <james.d.carlson@east.sun.com>
SUN Microsystems / 1 Network Drive 71.234W Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757 42.497N Fax +1 781 442 1677
[Still doesn't handle comments. -John]







Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.