|How to parse keywords that can be used as identifiers? firstname.lastname@example.org (Mark Thiehatten) (1996-08-19)|
|Re: How to parse keywords that can be used as identifiers? email@example.com (1996-08-20)|
|Re: How to parse keywords that can be used as identifiers? firstname.lastname@example.org (1996-08-20)|
|Re: How to parse keywords that can be used as identifiers? email@example.com (Jerry Leichter) (1996-08-21)|
|Re: How to parse keywords that can be used as identifiers? firstname.lastname@example.org (1996-08-24)|
|Re: How to parse keywords that can be used as identifiers? email@example.com (1996-08-24)|
|Re: How to parse keywords that can be used as identifiers? firstname.lastname@example.org (David L Moore) (1996-08-24)|
|Re: How to parse keywords that can be used as identifiers? email@example.com (1996-08-24)|
|Re: How to parse keywords that can be used as identifiers? firstname.lastname@example.org (Peter Brueckner) (1996-08-27)|
|[5 later articles]|
|From:||email@example.com (James Kanze US/ESC 60/3/141 #40763)|
|Date:||20 Aug 1996 23:08:44 -0400|
|Organization:||GABI Software, Sarl.|
Mark Thiehatten <firstname.lastname@example.org> writes:
|> I am working on a parser for a language that allows keywords to
|> be used as identifiers. This causes all kinds of problems.
|> I would like to know if somebody has already solved this problem,
|> and, of course, how.
|> I am using flex and bison to build the parser.
I had a paper in SIGPLAN notices about this some time back. (About 1990, I
think. Most of my reference material is currently in boxes, however, and I
don't know the exact reference.) The basic idea was to extend the lexer to
be able to return alternate tokens; and yacc to ask the lexer for the
alternate token when it found an error. The modifications to yacc were in
fact done by running an sed script over the generated parser, and so didn't
require access to the sources.
This method doesn't always work. Basically, the strategy was to return the
keyword; if this didn't parse, the lexer tried a user symbol. The problem
is that, at least in theory, and in some versions of yacc, optimization of
the default case caused the parser to reduce, when it could have shifted the
alternate token. In my application, we were able to prove that it was never
possible to reduce when there were alternative tokens available, but this is
not trivial, and it is not hard to imagine languages where it is not
Since my paper, I have also used a different technique: I generate the
nverbose output of the parser (-d for yacc), and use an awk script to
generate tables of legal tokens for the lexer. This works because the yacc
state variable (at least in MKS yacc, which is what we were using for this
project) is a global variable. If it isn't with flex, of course, you could
probably sed the generated output to make it one. In this case, if lex
finds several possible interpretations (e.g. as it always would with
keywords), it will look in the table to see what will shift, and return that.
Personally, I wouldn't design such a feature in a new language. But you
don't always have a choice. A debugger must be able to accept any user
symbol that is legal in the source language; if it is to accept several
source languages, then in practice, either it does something like this, or
it requires some sort of a meta-indicator to tell it when it is dealing with
a user symbol.
James Kanze Tel.: (+33) 88 14 49 00 email: email@example.com
GABI Software, Sarl., 8 rue des Francs-Bourgeois, F-67000 Strasbourg, France
Return to the
Search the comp.compilers archives again.