Re: Internal Representation of Strings

Marco van de Voort <marcov@stack.nl>
Fri, 27 Feb 2009 09:37:42 +0000 (UTC)

          From comp.compilers

Related articles
[27 earlier articles]
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-23)
Re: Internal Representation of Strings haberg_20080406@math.su.se (Hans Aberg) (2009-02-23)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-24)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-24)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-25)
Re: Internal Representation of Strings armelasselin@hotmail.com (Armel) (2009-02-26)
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-27)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-28)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-03-03)
Re: Internal Representation of Strings armelasselin@hotmail.com (Armel) (2009-03-02)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-03-03)
Re: Internal Representation of Strings hebisch@math.uni.wroc.pl (Waldek Hebisch) (2009-03-05)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-03-06)
| List of all articles for this month |

From: Marco van de Voort <marcov@stack.nl>
Newsgroups: comp.compilers
Date: Fri, 27 Feb 2009 09:37:42 +0000 (UTC)
Organization: Stack Usenet News Service
References: 09-02-051 09-02-077 09-02-092 09-02-104 09-02-112 09-02-118 09-02-121
Keywords: storage, i18n
Posted-Date: 27 Feb 2009 07:34:05 EST

On 2009-02-24, Hans-Peter Diettrich <DrDiettrich1@aol.com> wrote:
>> Latin only? But afaik even for Cyrillic and the Semitic language group it
>> doesn't matter.
>
> The standard libraries were restricted to the ASCII character set in
> the past, with trouble in all languages with more or entirely
> different characters. The introduction of codepages then allowed to
> use at least 256 characters, and nowadays most people extend the frame
> only to the Unicode BMP, and ignore other codepages with regards to
> memory and runtime requirements.


IMHO this is not a fair comparison. Here we are talking about a legal choice
as per standard, that remains working for others. Moreover, as long as
string types are clearly annotated the notations are automatically
convertable (and detectable) to a high degree, so I don't see what kind of
problems this choice would make. In effect we need to support both anyway.


> Chinese "character" sets. Tell me a programming language with proper
> support for native language text in string literals, and Unicode
> source code representation.


D2009? I don't know if they support beyond the BMP though, but assumed so.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.