Re: Internal Representation of Strings

Hans Aberg <haberg_20080406@math.su.se>
Sat, 14 Feb 2009 14:52:28 +0100

          From comp.compilers

Related articles
Internal Representation of Strings tony@my.net (Tony) (2009-02-14)
Re: Internal Representation of Strings mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2009-02-14)
Re: Internal Representation of Strings haberg_20080406@math.su.se (Hans Aberg) (2009-02-14)
Re: Internal Representation of Strings DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-02-14)
Re: Internal Representation of Strings marcov@stack.nl (Marco van de Voort) (2009-02-14)
Re: Internal Representation of Strings anton@mips.complang.tuwien.ac.at (2009-02-14)
Re: Internal Representation of Strings cfc@shell01.TheWorld.com (Chris F Clark) (2009-02-14)
Re: Internal Representation of Strings lkrupp@pssw.nospam.com.invalid (Louis Krupp) (2009-02-14)
Re: Internal Representation of Strings cr88192@hotmail.com (cr88192) (2009-02-16)
[31 later articles]
| List of all articles for this month |

From: Hans Aberg <haberg_20080406@math.su.se>
Newsgroups: comp.compilers
Date: Sat, 14 Feb 2009 14:52:28 +0100
Organization: Aioe.org NNTP Server
References: 09-02-051
Keywords: storage
Posted-Date: 14 Feb 2009 16:48:02 EST

Tony wrote:
> What are some good ways/concepts of internal string representation?
> Are/should string literals, fixed-length strings and dynamic-lenght strings
> handled differently? My first tendency is to avoid like the plague
> NUL-terminated strings (aka, C strings) and to opt for some kind of array
> with a length at the beginning followed by the characters that could be
> encapsulated at the library level with appropriate functions. But just a
> length seems like not enough information: the capacity (array length) also
> would be nice to have around. All thoughts, old and novel, welcome.


You might have a look at the C++ standard containers library; what you
describe is essentially std::vector specialized to std::string. That is,
a C array char[] with string length and array size, and a reference
count to avoid copying. Array size is adjusted with some hysteresis (to
avoid constant adjustment in border cases), for example, when it runs
put of space double and copy over, and when it is below one quarter,
half and copy over. Everything that involves dynamic allocation, (malloc
and free) is slow, so try using special techniques when possible: a
fixed (statically allocated) array, large more rarely dynamically
allocated arrays, alloca(), or a GC.


Use a standard library, if you can.


      Hans Aberg



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.