|Can Coco/R do multiple tokenizations email@example.com (2005-08-13)|
|Re: Can Coco/R do multiple tokenizations firstname.lastname@example.org (George Neuner) (2005-08-16)|
|Re: Can Coco/R do multiple tokenizations DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-08-16)|
|Re: Can Coco/R do multiple tokenizations email@example.com (Gene Wirchenko) (2005-08-16)|
|Re: Can Coco/R do multiple tokenizations cfc@shell01.TheWorld.com (Chris F Clark) (2005-08-21)|
|Re: Can Coco/R do multiple tokenizations firstname.lastname@example.org (Darius Blasband) (2005-08-21)|
|RE: Can Coco/R do multiple tokenizations email@example.com (Quinn Tyler Jackson) (2005-08-24)|
|From:||Chris F Clark <cfc@shell01.TheWorld.com>|
|Date:||21 Aug 2005 00:20:01 -0400|
|Organization:||The World Public Access UNIX, Brookline, MA|
|Posted-Date:||21 Aug 2005 00:20:01 EDT|
Multiple tokenizations is "hard". I agree that Neta-S, I believe now
called GrammarForge, is your best bet for built-in support.
However, if you find a copy of SIGPLAN Notices, Decemember 1999, you
will find I wrote a column on how to work around lexers and parsers
that don't support it. (There may be a copy on the Compiler Resources
web site (see my .sig) of the article--it's a Latex file.) None of
the workarounds are exceptionally pretty, but they aren't rocket
science either. (The relevant movie quip is: "This isn't rocket
science. This is brain surgery.")
Of course, you should well consider the advice that what you are doing
is probably going to be hard on your users also. It may seem friendly
to allow users to omit whitespace and to include operator characters
within the language.
However, allowing both in one language is going to make certain
statements change meanings when "unrelated" things in the program are
modified. Your example is the perfect case. If one starts with a
program with only a and b declared, the fragment "a!=b" means one
thing. If some maintainer then adds a declaration of a!, the meaning
of the fragment has changed. Who will find that error and how? You
can probably write a parser with Meta-S that detects all such cases,
but it will not be easy, and will it really be a benefit.
In the end, you will probably find users adding in extra-whitespace
just to avoid the ambiguity. If the users are going to do that, why
not make the language (system) do it for them? For example, perhaps
you could define whitespace-free and whitespace-full forms and a tool
which creates the whitespace-full form from the whitespace-free
version, flagging errors when the conversion is unambiguous. That
would allow the user to dash-off whitespace-free versions when that is
convenient, but would have the whitespace-full form as a "reference"
version. (When you think about it, the tool should go both ways.
Tools that do that support "round-trip engineering" as they say in the
I have often thought something like that might make C++'s templates
easier to understand. I believe Eiffel had something like that,
perhaps dealing with opaque types, where one wants a full "reference"
version for some cases and an elided version for other uses. Another
variation on this theme is exemplified by the literate programming
work, where untanglers and weavers (if I have my nomenclature right)
are used to translate the text into a variety of forms.
Hope this helps,
Chris Clark Internet : firstname.lastname@example.org
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
Return to the
Search the comp.compilers archives again.