Re: how to find gc roots in stack machine?

George Neuner <>
Wed, 03 Jul 2013 03:56:51 -0400

          From comp.compilers

Related articles
how to find gc roots in stack machine? (2013-07-01)
Re: how to find gc roots in stack machine? (Nils M Holm) (2013-07-01)
Re: how to find gc roots in stack machine? (2013-07-01)
Re: how to find gc roots in stack machine? (George Neuner) (2013-07-03)
Re: how to find gc roots in stack machine? (George Neuner) (2013-07-04)
| List of all articles for this month |

From: George Neuner <>
Newsgroups: comp.compilers
Date: Wed, 03 Jul 2013 03:56:51 -0400
Organization: A noiseless patient Spider
References: 13-07-002
Keywords: GC
Posted-Date: 03 Jul 2013 22:42:47 EDT

On Mon, 1 Jul 2013 05:19:01 -0700 (PDT), wrote:
>I am writing a simple compiler for a simple stack machine. Now I want
>to add a garbage collector. To do this I need a way to be able to find
>all roots in the stack at any given time (or at least at some special
>points in time). I can see that I would probably be able to generate
>some kind of "stack map" that would tell me which variables in the
>stack are roots. But I don't know how to deal with the other
>intermediate values that are generated on the stack during execution.
>I would appreciate any pointers/literature on how to:
>1. Find all the roots in the stack, including those that are pushed to the
>stack as intermediate results during execution.
>2. How to generate stack maps, common formats, etc...
>3. How to decide and where to check if gc is necessary.

Hi Nicolas,

I don't like to throw cold water, but a traditional stack machine is
possibly the worst case environment for trying to implement precise
GC, which I assume you really want because conservative GC doesn't
require pointer mapping the stack.

Although it's technically possible to track the locations of pointer
values as the stack is manipulated, it is impractical for all but very
small stacks where you can use a register sized bitmap to represent
the status of each element: pointer or not. While a 32 or 64 element
stack is overkill for expression evaluation, it is impossibly small
for general programming.

If you control the design of the target - e.g., a software VM - then
you are better off implementing a framed data stack and providing
frame relative load/store instructions. [See below for why.]

Framing does not prevent also (ab)using the stack for evaluating
expressions as is normal for a stack machine. If you are careful to
retain unmodified pointers to heap objects - i.e. do all your pointer
computations using copies - and to allocate heap objects "black" per
the GC tri-color abstraction, then temporary pointers can be ignored
by the collector.

Framing the data stack is simple: you just add another "register" to
the machine context. That is, your control stack becomes a stack of
structs (records) rather than simply a stack of return addresses. You
set the frame register as part of your function call preamble, after
saving the caller's context but before jumping to the callee's code.

Taking your questions in reverse order:

#3 - The place to check if a collection is necessary is in the
allocator. The simplest thing to do is wait until you can't satisfy
an allocation request and then collect - the traditional
"stop-the-world" method.

Non-moving collectors are easiest to implement ... copying compacting
or generational collectors, and concurrent or parallel collectors[*],
in general will be orders of magnitude more complicated and are not
good choices for a first effort.

[*] in GC literature "concurrent" generally does not mean "parallel".
Concurrent collectors are serial co-routine implementations which
interleave collection with program execution. Concurrent also
includes hardware assisted implementations which use VMM protection
handling to switch between program and collector. Parallel collectors
are multithreaded and may execute simultaneously with the program. Any
particular GC implementation may include elements of both.

#2 - Pointer mapping is quite easy given a framed stack: as a function
is compiled, its arguments and its locals are assigned to offsets
relative to a base address which will be available at runtime in the
cpu's "frame" register. Those offsets that refer to pointer variables
then can easily be collected to create a map.

Typical forms the map might take are bit fields, offsets encoded using
byte or word arrays, or compiler generated enumeration functions.
There isn't any standard method - you do what works best given your
target language(s) and machine. It's rather typical that porting the
language to a different CPU will require using different (but
equivalent) methods under the hood.

Since you also will need to construct pointer maps for heap objects,
it is best to use the same methods for both heap and stack and to
treat each stack frame as if it were an "object" to be scanned.
[This is why stack framing is desirable - in general you want object
memory layouts to be statically defined.]

Bit field or byte/word array maps are stored as static data associated
with the function's code. A static pointer to the map data, or to the
entry point of the enumeration function if you choose that route,
should be placed at a known offset from the function entry point so it
can be located using the function's address at runtime.

#1 - Scanning the data stack for roots is relatively simple: you walk
the control stack from top to bottom, with each record you use the
saved frame addresses to locate the function's data and the saved
return address to locate its call site. You examine the call site to
determine the function's address which you then use to find its
associated map or enumeration function.

For this to work it's important that you be able to parse function
call sites to determine callee addresses. If your calling convention
involves jumping to a [possibly] computed address on top of the stack
rather than coding addresses inline, you will need to provide another
way to find the map:

Whenever a function is called at runtime, you can load the address of
its frame map or enumerator function into yet another machine
"register" [again, saved as part of the context on the control stack].
This optimization lets you find the map or enumerator for each data
frame directly from its associated control stack record. It trades
some space for a reduction in scanning effort.

Then, using the map or enumeration function, you find the pointers in
the frame and add them to your GC scanning queue. You might want to
do simple checks for nulls and, if practical, for wild values that
can't possibly point to valid heap objects, so as to avoid putting
useless pointers into the scanning queue in the first place.

I know this isn't what you hoped to hear. Unfortunately, precise GC
just isn't compatible with certain machine platforms. You can, of
course, adapt your platform using some of the ideas presented here
[either tracking your stack or framing it], or you can implement
conservative GC instead of precise. I'm happy to help more if I can.

Good luck!

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.