|Ada95 to Ada2005 parser - currently using lex/yacc - problem with Unic email@example.com (2006-12-21)|
|Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with firstname.lastname@example.org (2006-12-22)|
|Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with email@example.com (Tommy Nordgren) (2007-03-08)|
|Re: Ada95 to Ada2005 parser - currently using lex/yacc - problem with firstname.lastname@example.org (Tom Copeland) (2007-03-16)|
|From:||Tommy Nordgren <email@example.com>|
|Date:||8 Mar 2007 19:54:50 -0500|
|Posted-Date:||08 Mar 2007 19:54:50 EST|
> Hi there,
> I have a tool that parses Ada95 code and am investigating the
> possibilty of updating it to support Ada2005.
> The biggest problem I am having at the moment is working out how to
> cope with Unicode characters. ...
> [The character set issues happen in the lexer which lex generates. A
> yacc parser sees only tokens. The question of unicode lexers has come
> up frequently over the past decade. See for example
> http://compilers.iecc.com/comparch/article/98-01-046 -John]
I suggest that you rewrite your grammar using the ANTLR tool
(www.antlr.org) ANTLR is quite powerful, and a specification file can
specify lexers, parsers, ant tree parsers/transformers.
Parsers can be generated in Java, C++, c# and python.
(This applies to version 2.7.6)
I don't know if the later 3.0 series includes code generators for other
languages than Java, since I'm currently using 2.7.6
ANTLR is written in Java, by the way.
ANTLR supports unicode, but one point to consider with ANY tool, is
that you will need an module that supports converting the input text
files to canonical utf-16.
The one thing to beware of when switching from yacc/bison, is that
ANTLR doesn't support left-recursive rules. EBNF notation, with embedded
code fragments, can be used instead.
If you are interested, ANTLR's primary architect, Terence Parr, is
currently writing a book about ANTLR 3.0 that will be published later
Return to the
Search the comp.compilers archives again.