Re: Fat references

Ray <>
Sun, 03 Jan 2010 09:36:40 -0800

          From comp.compilers

Related articles
[11 earlier articles]
Re: Fat references (BGB / cr88192) (2010-01-01)
Re: Fat references (2010-01-02)
Re: Fat references (glen herrmannsfeldt) (2010-01-02)
Re: Fat references (Robert A Duff) (2010-01-02)
Re: Fat references (Jon Harrop) (2010-01-03)
Re: Fat references (Hans-Peter Diettrich) (2010-01-03)
Re: Fat references (Ray) (2010-01-03)
Re: Fat references (2010-01-03)
Re: Fat references (glen herrmannsfeldt) (2010-01-03)
Re: Fat references (Jon Harrop) (2010-01-04)
Re: Fat references (Kaz Kylheku) (2010-01-04)
Re: Fat references (BGB / cr88192) (2010-01-03)
Re: Fat references (Robert A Duff) (2010-01-04)
[11 later articles]
| List of all articles for this month |

From: Ray <>
Newsgroups: comp.compilers
Date: Sun, 03 Jan 2010 09:36:40 -0800
Organization: Doubtful
References: 09-12-045
Keywords: storage, GC
Posted-Date: 03 Jan 2010 14:46:05 EST

Jon Harrop wrote:

> I've been working on a project called HLVM in my spare time:
> One goal was to have fast interop with C, so I didn't want to copy the
> traditional style of placing a header with GC metadata before every value
> in the heap because that would require C arrays to be copied just to add
> this header. I couldn't be bothered to allocate a separate header so,
> instead, I pulled the GC metadata into the reference. So my references are
> now "fat": a quadword of pointer to run-time type, array length or union
> type tag, pointer to mark state and pointer to the actual data itself.
> This actually works rather well except I sacrificed atomic read/write of
> references. Has it been done before?

A lot of modern VMs use fat references. I know that the JVM uses at
least 2x(sizeof( void* )) to represent a pointer. And yes, it's to
support the GC.

When I implemented a lisp runtime, I went through two iterations; in
the first, pointers were the size of pointers on the native
architecture (in this case 64 bits) and there was a 5-word (ie, the
size of 5 pointers) overhead for each heap-allocated object. But the
code was complex and, as it turned out, slow.

In the second iteration, I used fat pointers; Each pointer was the
size of four native-architecture pointers where one was the pointer to
the object on the heap, two were "forward" and "back" pointers that
joined all pointers into a single doubly linked list, and one was a
set of immediate binary flags (typetag, "soft" pointer, last pointer
in this object, etc). Different regions of the list corresponded to
different populations w/r/t Garbage Collection (traversed root,
untraversed root, reached and traversed live data, reached but not
traversed live data, not yet reached possibly-live data, and known
garbage not yet finalized/deallocated). The garbage collector knew
where the dividing points between these regions were, and would
process pointers found at those points in the list, often moving them
to one of the other boundary points. At "generation boundaries" when
the "untraversed" section was empty, it would just shift the boundary
points and start over. The GC code was very simple (<200 lines of C)
and needed to know nothing about the internal structure of the
user-defined types beyond a few bits set in the object allocators.

I expected performance with fat pointers to be horrible, but it
actually wasn't. It turned out to be faster than the first version,
(on an AMD64 dual-core processor with 128K/2M cache), probably due to
more inlined values (I was using a lot of doubles) and better locality
of reference.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.