Re: Recognising pointers during garbage collection?

Joachim Durchholz <>
23 Jul 2003 10:37:46 -0400

          From comp.compilers

Related articles
Recognising pointers during garbage collection? (Maarten D. de Jong) (2003-07-21)
Re: Recognising pointers during garbage collection? (Sven Gohlke) (2003-07-23)
Re: Recognising pointers during garbage collection? (Joachim Durchholz) (2003-07-23)
Re: Recognising pointers during garbage collection? (2003-07-25)
Re: Recognising pointers during garbage collection? (Dobes Vandermeer) (2003-07-25)
Re: Recognising pointers during garbage collection? (Fergus Henderson) (2003-07-25)
Re: Recognising pointers during garbage collection? (Joachim Durchholz) (2003-07-25)
Re: Recognising pointers during garbage collection? (Marcin 'Qrczak' Kowalczyk) (2003-07-25)
Re: Recognising pointers during garbage collection? (Maarten D. de Jong) (2003-07-25)
[6 later articles]
| List of all articles for this month |

From: Joachim Durchholz <>
Newsgroups: comp.compilers
Date: 23 Jul 2003 10:37:46 -0400
Organization: Compilers Central
References: 03-07-149
Keywords: GC
Posted-Date: 23 Jul 2003 10:37:46 EDT

Maarten D. de Jong wrote:
> I realise there is probably no fool-proof way of detecting whether a
> bit- pattern is a pointer or not, and that consequently there is a
> lot of black magic and heuristics involved.


> Can someone explain some of the more common techniques in
> uncooperative languages (like C)

I once had a short exchange with Hans Boehm on exactly this topic, and
here's what I recall:
1. You can immediately exclude all bit patterns that designate addresses
outside the heap address range. These may be pointers to hardware and
whatnot, but you don't garbage collect them since they are not in the
heap. On typical current-day configurations, the heap is just a fraction
of the total address space, so this filters out the vast majority of
2. On most hardware architectures, pointers must (or should) be aligned.
I.e. you typically don't check every byte, you just check words starting
at word boundaries.
3. If the implementation doesn't have pointers that refer to the middle
of a memory block, you can simply look into the heap data structures to
see whether the bit pattern is a valid heap block address.

Of course, these are all heuristics. Some bit patterns will be
interpreted as pointers that aren't pointers, and consequently, the
referred-to block will remain allocated though it could be freed.
However, the chances for that happening are small enough that one can
usually stop worrying about that.

> or cooperative ones (which you design yourself)?

In a cooperative environment, the compiler and/or run-time system makes
sure that the garbage collector knows which slots of every heap block
are pointers. The actual encoding of this information can vary
considerably; among the techniques that I have seen are:
1. Tagged data. Reserve a bit in every word of information that directly
indicates whether it's a pointer or an immediate value. The bit may be
LSB or MSB. This is typical for dynamically-typed languages.
2. Descriptors. At the start of every block (or maybe just before its
start) the garbage collector will find a bitmap (or a pointer to a
bitmap) that describes which slots of the memory block are pointers.
This is particularly attractive for statically-typed object-oriented
languages, since every object must carry a type field anyway, and the
type data is the perfect place for the bitmap.
3. Separation of pointer and nonpointer data. I.e. the compiler reorders
all struct data so that the first half of every struct is just pointers,
the second half is just nonpointers. This approach is feasible but
creates problems if you ever wish to construct a pointer to a struct
embedded within another struct. For each block, the heap data structures
indicate where the division between pointer and non-pointer data is. The
advantage is that you don't need a bitmap, which can be important if the
CPU has slow bit extraction instructions (e.g. the early x86 processors).
4. Monotyped heap blocks. A heap block may contain either pointer or
nonpointer data. This is usually a misguided attempt to make a more
space-efficient version of (3), since this requires just a bit to
differentiate pointer from nonpointer data ((3) requires an offset into
the block), but this comes at the price of having two blocks (and all
associated heap data structures) instead of one. Typical net result: a
loss of one bit per heap block... but it can still be worth the effort
if you know that the vast majority of blocks is either pointer or
nonpointer. (In other words, this is useful only if you hand-craft a
heap management and garbage collection for a single application, which
would be a very unusual situation...)

I'm not sure whether the above list is exhaustive; if anybody has
interesting alternatives, please let the list know :-)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.