|A question about lexer portability in C ? email@example.com (Frederic Guerin) (1997-09-23)|
|Re: A question about lexer portability in C ? firstname.lastname@example.org (1997-09-24)|
|Re: A question about lexer portability in C ? email@example.com (Henry Spencer) (1997-09-28)|
|From:||Henry Spencer <firstname.lastname@example.org>|
|Date:||28 Sep 1997 23:17:03 -0400|
|Organization:||SP Systems, Toronto|
Frederic Guerin <email@example.com> wrote:
>The question is : Can I fix this table at compile time or do I need to
>build it at run time so as to make sure that the correct codes will be
>assigned to the correct characters ?
In general, you must build it at run time. Different users, even on a
single system, may be using different character sets, with different
ideas about what constitutes (say) an alphabetic character. Except in
unusually favorable environments, there's just no way to pre-build a
single copy of the code and have it always get things right.
>...May I assume that all character sets used
>over the world are superset of the ANSI one ( with identical character
>code ) ?
Unfortunately, no. First, as our moderator mentioned, there is still
substantial use of totally non-ASCII character sets like EBCDIC. Second,
there is still substantial use of other ISO646-derived character sets
which resemble ASCII but are not supersets of it -- for example, some of
them have extra alphabetic characters where ASCII puts characters like "`"
and "[" and "|". Third, even when character sets are exact supersets of
ASCII, that doesn't mean you can just ignore the non-ASCII part, because
non-English users in particular will want to put non-ASCII alphabetics
into identifiers etc.
| Henry Spencer
Return to the
Search the comp.compilers archives again.