Re: Internal Representation of Strings

"cr88192" <cr88192@hotmail.com>
Tue, 3 Mar 2009 00:05:40 +1000

          From comp.compilers

Related articles
[29 earlier articles]
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-24)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-24)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-25)
Re: Internal Representation of Strings armelasselin@hotmail.com (Armel) (2009-02-26)
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-27)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-02-28)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-03-03)
Re: Internal Representation of Strings armelasselin@hotmail.com (Armel) (2009-03-02)
Re: Internal Representation of Strings tony@my.net (Tony) (2009-03-03)
Re: Internal Representation of Strings hebisch@math.uni.wroc.pl (Waldek Hebisch) (2009-03-05)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-03-06)
| List of all articles for this month |

From: "cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Tue, 3 Mar 2009 00:05:40 +1000
Organization: albasani.net
References: 09-02-051 09-02-068 09-02-078 09-02-120 09-02-125 09-02-134 09-03-001
Keywords: storage, design
Posted-Date: 02 Mar 2009 15:29:03 EST

"Tony" <tony@my.net> wrote in message news:09-03-001@comp.compilers...
> "Armel" <armelasselin@hotmail.com> wrote in message


<snip>
>
> My goal is to get away from all the APIs that use null-terminated strings,
> so I will be replacing all of that. Not needing that null terminator would
> be an indication of success of a string implementation that wished to
> depart from that paradigm.


what is the problem with NULL-terminated strings anyways?
my code uses them all the time with no real ill-effect...


so, maybe the big question that can be asked is:
really, why do you so much dislike the NULL-terminated strings?...




at least it is not something silly, like a NULL-terminated array, where the
NULL is actually useful for something (actually, I know about
NULL-terminated arrays, because I tended to use them at one point, but
generally stopped using this approach as it was annoying in many cases...).
(my current approach is more often to fetch the memory-object size from the
MM/GC and divide by the item/pointer size, although by convention this is
'((size/item_size)-1)', as I still tend to reserve space for said
terminator...).


but, I guess it is always possible for one to use another terminal for an
array (ie: one that does not meaningfully occur in normal data), making a
generic magic-terminated array (for example, I could use the magic pointer
UNDEFINED, which I usually define as '(void *)(-1)', although '(void *)(1)'
could be better, or one can define an even more obscure terminator which has
no other use, since I tend to use UNDEFINED for some things as well, and so
it is not impossible that it could be meaningfully present in an array...).


but, I guess the biggie question would be:
which approach would be faster when it comes to things like array bounds
checking?...


asking the MM has a cost, but it is mostly constant, whereas scanning
forwards has a cost of O(n) where n is the length, making is slower for
non-tiny arrays...




so, I guess maybe the issue in question is whether one uses strings like
arrays?...
typically, I have not done do (rather, as noted, I tend to regard them as
pass-by-value immutable objects).


in some cases, string equality operations are done by comparing the pointers
(a nifty advantage of using hash tables for strings merging, since we can
know in these cases that if they have the same text, they have the same
pointer...). (ok, technically this is for 'symbols' and 'keywords', but in
conventional terms these are strings...).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.