Re: Can Coco/R do multiple tokenizations

George Neuner <gneuner2@comcast.net>
16 Aug 2005 11:17:24 -0400

From comp.compilers

Related articles
Can Coco/R do multiple tokenizations vardhanvarma@gmail.com (2005-08-13)
*Re: Can Coco/R do multiple tokenizations gneuner2@comcast.net (George Neuner)* (2005-08-16)**
Re: Can Coco/R do multiple tokenizations DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-08-16)
Re: Can Coco/R do multiple tokenizations gene@abhost.us (Gene Wirchenko) (2005-08-16)
Re: Can Coco/R do multiple tokenizations cfc@shell01.TheWorld.com (Chris F Clark) (2005-08-21)
Re: Can Coco/R do multiple tokenizations darius@raincode.com (Darius Blasband) (2005-08-21)

| List of all articles for this month |

From:	George Neuner <gneuner2@comcast.net>
Newsgroups:	comp.compilers
Date:	16 Aug 2005 11:17:24 -0400
Organization:	Compilers Central
References:	05-08-053
Keywords:	lex
Posted-Date:	16 Aug 2005 11:17:24 EDT

On 13 Aug 2005 00:27:08 -0400, vardhanvarma@gmail.com wrote:

> Consider a langauage, which allows ! and = in its identifiers.
> Of course usual C operators like !,= etc are also allowed.
> Consder this string (note no whitespaces ):
> 'a!=b'
:
>In case of ambiguity I'd idealy like to generate error and abort.

Heuristics aside, I think that if you want to allow operators to be
embedded in identifier names and also use infix operators in the same
language, you are going to have to depend on correct delimiting and
trust that the user typed what she meant. Except in extremely obvious
cases, it's not a good idea for the compiler to be guessing at the
user's intent.

>Can Coco/R, (can any other parser/lexer generator ) do multiple
>tokenizations & parser-tree-generations

AFAIK, there are no existing lexer gen tools which allow alternate
tokenizations for the same input text. You are free to write one of
course.

I don't know Coco/R, but what you want is possible by using deliberate
backtracking and multiple lexers. It's a slow and painstaking process
of trying a particular parse, saving the AST if the parse succeeds,
then backtracking, switching lexers and trying the same parse again.
If you end up with no ASTs, the parse failed, and if you end up with
multiple ASTs, the code was ambiguous.

It is likely to be *very* slow as you will need to keep all the lexers
in sync. Each time you switch you will need to adjust the input
starting position because the last successful parse may have used
tokens from a different lexer. There are various ways you might try
to optimize this time waster but the positioning has to be based on
the original source to be correct.

Personally I don't think it's worth the effort. I would parse the
code exactly as written and let users suffer the consequences of not
using the space bar. YMMV

George

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Can Coco/R do multiple tokenizations

George Neuner <gneuner2@comcast.net>16 Aug 2005 11:17:24 -0400

George Neuner <gneuner2@comcast.net>
16 Aug 2005 11:17:24 -0400