Re: UCS Identifiers and compilers

"Bartc" <>
Fri, 12 Dec 2008 14:39:26 GMT

          From comp.compilers

Related articles
[2 earlier articles]
Re: UCS Identifiers and compilers (Dmitry A. Kazakov) (2008-12-11)
Re: UCS Identifiers and compilers (James Harris) (2008-12-11)
Re: UCS Identifiers and compilers (Marco van de Voort) (2008-12-11)
Re: UCS Identifiers and compilers (Ira Baxter) (2008-12-11)
Re: UCS Identifiers and compilers (Ray Dillinger) (2008-12-11)
Re: UCS Identifiers and compilers (Chris F Clark) (2008-12-11)
Re: UCS Identifiers and compilers (Bartc) (2008-12-12)
Re: UCS Identifiers and compilers (Mike Austin) (2008-12-12)
| List of all articles for this month |

From: "Bartc" <>
Newsgroups: comp.compilers
Date: Fri, 12 Dec 2008 14:39:26 GMT
Organization: Compilers Central
References: 08-12-061
Keywords: i18n
Posted-Date: 12 Dec 2008 10:34:18 EST

"William Clodius" <> wrote in message
> As a hobby I have started work on a language design and one of the
> issues that has come to concern me is the impact on the usefulness and
> complexity of implementation is the incorporation of UCS/Unicode into
> the language, particularly in identifiers.

> 1. Do many of your users make use of letters outside the ASCII/Latin-1
> sets?

My (very few) users were based in Europe, and I felt it important that
they be able to use any special characters in their language. So these
were marked as being alphanumeric in a table of the 256 codes.

This allowed identifiers to use the special characters, although
keywords were in English, with the possibility of using macros to
redefine them. But when I looked their source code, I don't remember
seeing these being used. Maybe they were used to the restrictions of
other languages, or maybe I should have told them about the feature...

(Allowing the end-users to use special characters in their data, and
for filenames, and so on, was another matter with it's own problems.)

This was a few years ago and having the possibility of only two 8-bit
character sets made things very easy. However deciding whether a
16-bit or wider character is suitable for an identifier or not I don't
think is too challenging.

Where the identifiers need to be used outside the language (for
linking for example), then that's also a minor problem if the other
system is more restricted. But my language was self-contained.

> 2. What are the most useful development environments in terms of dealing
> with extended character sets?
> 3. Visually how well do alternative character sets mesh with a language
> with ASCII keywords and left to right, up and down display, typical of
> most programming languages?

If a source file is considered a stream of 16-bit character codes,
then the visual representation is irrelevant (or at least, someone
else's headache).

> 4. How does the incorporation of the larger character sets affect your
> lexical analysis? Is hash table efficiency affected? Do you have to deal
> with case/accent independence

In my case support consisted of an entry in a table. Very simple. But
it means accented versions of 'A' for example were all considered
different. Better however than treating upper and lower case
differently which usually seems the case.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.