|GC question email@example.com (Paul Rubin) (2007-07-27)|
|Re: GC question firstname.lastname@example.org (Ulf Wiger) (2007-07-27)|
|Re: GC question email@example.com (Gene) (2007-07-28)|
|Re: GC question firstname.lastname@example.org (Ray Dillinger) (2007-07-28)|
|Re: GC question @iecc.com <email@example.com (Paul@iecc.com, Rubin) (2007-07-30)|
|Re: GC question firstname.lastname@example.org (2007-08-13)|
|From:||Ray Dillinger <email@example.com>|
|Date:||Sat, 28 Jul 2007 12:26:03 -0700|
|Posted-Date:||28 Jul 2007 15:29:38 EDT|
Paul Rubin wrote:
> Suppose you build a big list of cons cells, say a billion of them
> (you're on a large machine). This is in a runtime with traditional
> marking or copying gc, no generation scavenging or region inference or
> anything like that. The collector runs every N bytes of allocation
> for some fixed N. Yes I know that's a dumb way to write an allocator
> for a big-system implementation but it's fairly common for small ones.
> It seems to me that the running time to allocate N cells is O(N**2)
> because you run the collector O(N) times during the allocation, and
> each collection costs O(N) on average.
Yes, that's true.
> I never realized this before. Is it a well-known phenemonon?
It's very well-known; when you implement garbage collection you
want to be able to say at least in broad terms how much it slows
down the execution and how well it scales to larger and smaller
memory arenas, so people profile their GC's routinely and test
them against real problems. When you profile looking for costs,
it's very easy to notice.
> Is the
> main answer something like generation scavenging?
Man, I don't know how to answer this one. Yes, generation
scavenging is one strategy that a fair number of implementers
use to try to hold down GC costs in general. But it isn't the
specific and true answer to your question.
There are a few truths about Garbage Collection. One of them
is that if your system generates garbage at all, then on average,
some fraction of your memory arena will consist of "floating
garbage" - that is, data that cannot be used by your program,
but which has not (yet) been reaped.
Another truth, again assuming a case where your system actually
does GC at all, is that the Garbage collector will, on average,
occupy some fraction of your CPU power.
Now, here's the harsh truth that applies to all conventional
Garbage Collectors. As your live data grows in size, the
smallest achievable product of those two fractions will tend
to increase. The smaller the fraction of the memory space
that your system allows to be floating garbage, the larger
the fraction of CPU power your GC will use. The smaller the
fraction of CPU power you devote to it, the greater will be
the fraction of your memory arena devoted to holding floating
For All collectors B,
For All weighting factors A,
For All Memory arena sizes C,
(GAR%(B,C) X A)/ CPU%(B,C) = 1,
GAR%(B,C) X CPU%(B,C) = N,
(GAR%(B,C+1) X A)/ CPU%(B,C+1) = 1,
==> GAR%(B,C+1) X CPU%(B,C+1) > N.
That is, if you hold the ratio of CPU GC effort to proportion
of floating garbage constant, a larger memory arena means that
the product of the two must on average be greater.
By "conventional" garbage collectors, I mean all collectors
that work by tracing all live data - that includes
mark-and-sweep, stop-and-copy, generation-scavenging, and
since most other GC algorithms are some variation of these,
more often than not this is true of any given GC system.
The order of the increase varies from one collection algorithm
to another, but to avoid an increase entirely is very very
difficult and has only been achieved in the context of a very
small set of problems using very unconventional techniques.
The situation you describe (full trace for every N bytes
allocated) will result, on average, in "constant" floating
garbage of N/2. This N was probably carefully selected to
be near the optimum or smallest achievable product of CPU
and memory fractions, for a normal-sized memory arena.
But as the size of the memory arena grows, N/2 is a
smaller and smaller fraction of the arena size. So not
only are you moving to a larger arena, wherin the minimum
achievable cost is greater, but you are also moving away
from the arena size in which N/2 was a reasonable choice.
The product of N/2 and the CPU usage must increase -
and that works out in this particular case to mean the CPU
usage increases according to the square of the memory size.
The good news is that you can mostly fix this, because this
is nowhere near the "smallest achievable" product of CPU
fraction and memory fraction that that collector can achieve.
You can "address" this in a few different ways. First,
you can regulate how often you call your GC. Minimal use
of CPU (but maximal floating garbage) results from the old-
skool lisp-machine approach of calling the GC when you run
out of memory and not before. A "floating garbage" fraction
that's a relatively constant fraction of your memory arena
results from calling your collector once per N bytes
allocated where N is a constant fraction (say, half) of
your memory arena, but your CPU usage will increase
moderately with the size of live data (You are now seeing
a polynomial increase. Trust me when I say that "moderate"
increases are much less painful).
You can also fix the GC cost relative to the amount of
allocation done. Use an incremental version of your GC
that regulates how much work it does according to how
much allocation has happened since last time it was called
(for example, traversing, marking, and potentially freeing
no more than M objects for every one allocated).
You can experiment to try to find a GC regulation function
that stays fairly close the minimum-achievable-product over
a large variety of arena sizes.
Also, you can use a better collector. Other things being
equal, a generational collector is almost always better than
a mark-and-sweep or copying collector in that it scales
better to larger memory arenas and live data sets.
Finally, look at the "other things" I so cavalierly mentioned
last paragraph as being equal - how your allocator handles
out-of-memory errors, etc, can have a huge impact on your
GC's overall performance. This is what Ulf Wiger was
talking about in his response to you. Right now, you are
paying the CPU cost to keep garbage down to less-than-N-bytes.
This is a tiny fraction of your enlarged memory arena, so
you probably aren't seeing many slowdowns due to handling
out-of-memory conditions. If you allow floating garbage to
vary with memory size, which it should, then your memory
footprint overall will grow larger. You'll hit the end of your
memory arena (and whatever penalties your runtime incurs for
it) more often. This is true with your current collector and
the simple change of making N a constant fraction of the memory
arena size. And though a more sophisticated generational
collector will be better than that, it's still going to be
true, even of a generational collector.
For what it's worth, if you increase the memory arena by a
constant amount whenever you "run out", at a cost proportional
to the new size of your arena, then you will incur CPU costs
proportional to the square of your final arena size if you
resize many times. This is a variation of the same problem
you're having now with invoking your tracing collector at a
constant number of allocated bytes. It is much cheaper in
the long run to double the size of the memory arena each time
you run out of space - then you get costs linear with the size
of your final memory arena.
Return to the
Search the comp.compilers archives again.