Re: Best Ref-counting algorithms?

=?ISO-8859-1?Q?Christoffer_Lern=F6?= <>
Fri, 17 Jul 2009 11:14:27 -0700 (PDT)

          From comp.compilers

Related articles
[18 earlier articles]
Re: Best Ref-counting algorithms? (2009-07-16)
Re: Best Ref-counting algorithms? (BartC) (2009-07-16)
Re: Best Ref-counting algorithms? (George Neuner) (2009-07-16)
Re: Best Ref-counting algorithms? (Hans Aberg) (2009-07-17)
Re: Best Ref-counting algorithms? (Hans Aberg) (2009-07-17)
Re: Best Ref-counting algorithms? (Larry) (2009-07-17)
Re: Best Ref-counting algorithms? (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-07-17)
Re: Best Ref-counting algorithms? (glen herrmannsfeldt) (2009-07-17)
Re: Best Ref-counting algorithms? (George Neuner) (2009-07-17)
Re: Best Ref-counting algorithms? (George Neuner) (2009-07-17)
Re: Best Ref-counting algorithms? (Hans-Peter Diettrich) (2009-07-18)
Re: Best Ref-counting algorithms? (=?ISO-8859-1?Q?Christoffer_Lern=F6?=) (2009-07-18)
Re: Best Ref-counting algorithms? (glen herrmannsfeldt) (2009-07-18)
[9 later articles]
| List of all articles for this month |

From: =?ISO-8859-1?Q?Christoffer_Lern=F6?= <>
Newsgroups: comp.compilers
Date: Fri, 17 Jul 2009 11:14:27 -0700 (PDT)
Organization: Compilers Central
References: 09-07-018 09-07-039 09-07-043 09-07-054
Keywords: GC
Posted-Date: 17 Jul 2009 14:47:39 EDT

On Jul 16, 6:35 pm, George Neuner <> wrote:
> On Wed, 15 Jul 2009 00:05:12 -0700 (PDT), Christoffer Lernv
> <> wrote:
>> On Jul 14, 8:27 pm, George Neuner <> wrote:
>>> On Sun, 12 Jul 2009 13:41:59 -0700 (PDT), Christoffer Lernv
>>>> If one would like to implement a fast GC using refcounting, should I
>>>> implement something like in those papers, or are there better
>>>> algorithms out there?
>>> I haven't read Bacon's paper, but you'll note that Lins uses a marking
>>> scan as a backup to identify and break data cycles for the reference
>>> counter. Although the marking scan need not be run unless memory is
>>> tight, it significantly complicates the overall implementation.
>> I implemented the simple non-concurrent version of the algorithm as
>> described in Bacon's paper and I did not find it all that complex, the
>> concurrent version is naturally more involved though.
> Marking is certainly not difficult - not even when done concurrently,
> but tuning it for good performance is tricky. In particular, how you
> handle the queue - as a stack or as a FIFO - and how you handle out of
> space conditions on the queue are critical.

My implementation used a bounded array that would GC all roots
automatically when the array is full. I noticed that in my exprimental
microbenchmark I got the best results when having around 100 entries
in the array - increasing or decreasing this number would increase the
GC-time. (My benchmark would create groups of 4 objects with cyclic
dependency and let the cycle collection run automatically). I suspect
that the optimal number will differ quite a bit depending on the type
of allocation that is done.

>> ... what I find interesting is that the scanning in the case of
>> ref-counting is on a small subset of all objects. For more advanced
>> tracing collectors, this is also the case. There is a nice paper on
>> the duality between ref-counting and tracing collectors by Bacon,
>> Cheng and Rajan.
> You can only safely restrict scanning if you can segregate objects
> that are shared (or could be). Most programming languages don't allow
> for that distinction up front, so the compiler or runtime environment
> has to handle it behind the scenes.

You are talking about restricting scanning for a tracing GC?

> >Bacon also claims that their collector got really good results when it
> >comes to GC cost, quite competitive with tracing collectors.
> I have no doubt in Bacon's claims for the applications and conditions
> under which he tested.

Most comes from the implementation of the Jalapeno java VM. From what
I can gather there was quite a lot different GCs implemented and
benchmarked on the VM.

>> I guess I simply have to implement a few different collectors (tracing
>> and ref-counted) and see which type I like best.
> If you're after very high performance (a hobby of mine due to a stint
> in real time programming), you should look into Baker's treadmill
> collector. Treadmills are expensive space-wise, but they have fixed
> processing costs and worst case behavior that is completely
> predictable (a critical requirement for RT).

I saw that Treadmills was supposedly covered in Lins book mentioned
earlier (which I ordered), I'll do some googling to find out more too.
But if you have any direct links to papers I'd be happy if you would
share them.

> Every GC design has weak spots and most have worst case performance
> that is extremely bad (treadmills are an exception to bad worst case
> performance, but their weak spot is fixed size allocation blocks).
> The trick is to anticipate the use - which is all but impossible if
> the GC is intended to support general programming - and design for
> either the best average case or the worst possible case. Most GC
> systems today are hybrids designed for the best average case.

My problem with tracing GCs is that once the heap grows in excess of a
few hundred MB, most exhibit noticeable pauses. I want a GC that is
suitable for game programming, and I'd much rather have a steady lower
throughput that is predicable, than one that occasionally pauses but
with better average performance.

For a game, you'd want 30 or 60 FPS, meaning that the total time
between frames are ~33 ms or ~17 ms respectively.

It's extremely more preferrable that the GC steals 5ms every frame
than 100ms every 600 frames.

(And that would be a rather well-behaving GC - I've experienced java
applications freezing tens of seconds on my new Macbook Pro when
collecting around 100 Mb out of 250Mb)

So anything that has a constant cost is much preferrable to one that
cause you to pay chunks of time for GC.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.