Re: Stack based-Register Based

"Joachim Durchholz" <joachim_d@gmx.de>
25 Feb 2001 11:08:26 -0500

          From comp.compilers

Related articles
[8 earlier articles]
Re: Stack based-Register Based rpw3@rigden.engr.sgi.com (2001-02-04)
Re: Stack based-Register Based Martin.Ward@durham.ac.uk (2001-02-04)
Re: Stack based-Register Based anton@mips.complang.tuwien.ac.at (2001-02-04)
Re: Stack based-Register Based anton@mips.complang.tuwien.ac.at (2001-02-04)
Re: Stack based-Register Based joachim_d@gmx.de (Joachim Durchholz) (2001-02-15)
Re: Stack based-Register Based joachim_d@gmx.de (Joachim Durchholz) (2001-02-25)
Re: Stack based-Register Based joachim_d@gmx.de (Joachim Durchholz) (2001-02-25)
Re: Stack based-Register Based henry@spsystems.net (2001-02-25)
Re: Stack based-Register Based ndalton@ics.uci.edu (Niall Dalton) (2001-03-01)
| List of all articles for this month |

From: "Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups: comp.compilers
Date: 25 Feb 2001 11:08:26 -0500
Organization: Compilers Central
References: 01-01-124 01-01-136 01-02-010 01-02-033
Keywords: GC, architecture, comment
Posted-Date: 25 Feb 2001 11:08:26 EST

Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
> Another claim is that the memory machine architecture makes
> reference counting more feasible, and therefore requires less
> memory. I am not familiar enough with reference counting to comment
> on this.


Reference counting (RC) vs. mark-and-sweep garbage collection (MAS) is a
difficult (and controversial) issue.


1. Algorithmic limits
MAS will always succeed.
RC cannot collect data that has circular links. It will also get into
trouble if a reference counter overflows (which means that you either
have to use a "sticky" counter that stays at its maximum value once it's
reached, or you have to make the counter large enough to hold the
maximum number of memory blocks in the system, meaning that the
reference counter should have as many bits as a pointer).
This leaves the RC systems with a choice for the least of three evils:
(a) implement an MAS that cleans up what RC leaves behind, increasing
the system development time and complexity, (b) disallow cycles (I know
of no system that does this, IOW it's manual testing, static types can
help), (c) just live with memory leaks (many scripting languages do
this; Python recently changed from this policy to MAS).


2. Space overhead
MAS requires one bit per memory block. Some implementations use two
bits, but that's about it. These bits can usually be squeezed somewhere
into the already-existing data structures of the used/free block
management.
RC requires a count field in every memory block. One-bit reference
"count" schemes do exist, but I don't know enough about them to comment.


3. Timeliness of resource deallocation
RC will free resources as soon as they aren't needed anymore.
MAS will free resources later - much later occasionally.


4. Distribution of time overhead (i.e. real-time usefulness)
MAS can be run in parallel with the application, taking away 5%-10% of
CPU time. (Early implementations of MAS did a "full collect" whenever
heap space was used up, creating occasional long standstills. This is
still the best algorithm in terms of total CPU time, but it's
unacceptable even in marshmallow-soft real-time applications like user
interfaces, so all modern MAS algorithms are "incremental" and can run
in parallel with the main application.) In a real-time environment, MAS
can be made a low-priority task; some analysis is required to make sure
that MAS will always have enough time to clean up the garbage (find an
upper bound on the amount and pointer chain depth of live data at the
end of a computation cycle, and measure how long it would take MAS to
trace it - most MAS algorithms take time proportional to live data).
RC distributes its overhead relatively evenly over a computation, with
one exception: if the last pointer to a large tree of blocks goes, the
system may go off deallocating for a long time. In a real-time
environment, this can be avoided by putting the blocks on a "pending
deallocations" list, but this will make memory consumption and
deallocation timeliness just as good or bad as that of MAS.


5. Amortized time overhead
RC must update reference counts whenever a pointer is changed. This
means an overhead of two memory writes per pointer memory write! RC also
requires time proportional to the number of blocks deallocated.
MAS requires additional reads during its mark phase (every live block is
touched once). Most MAS algorithms take time proportional to the number
of live memory blocks ("generational" MAS algorithms trade some
timeliness to avoid having to trace all live data, they trace just the
working data set).


6. Locality of reference
RC accesses the reference counts in potentially far-away blocks. IOW an
RC system displays a "fringe" of written-to blocks around the set of
modified blocks; if it's lucky, this fringe is already within the set of
read blocks, otherwise RC has an enormous cache overhead.
MAS will touch every block in the address space, sooner or later.
Generational algorithms reduce the cache load considerably (they don't
scan data that hasn't been changed for a long time), but occasionally a
full collect will rescan all the data.




It's a complex issue, and many effects are difficult to judge from
theory, so it would be useful to have good benchmarks. The problem is
that the presence and nature of garbage collection has a heavy influence
of software design, so it's impossible to run an undisputable test.
I know of one benchmark. It took several programs that had neither MAS
nor RC, and retrofitted a MAS algorithm (the famous Boehm-Demers-Weiser
collector). The problem is that the results aren't really comparable:
the programs were written to manually collect all garbage, so they were
forced to do some bookkeeping and into some design trade-offs that would
not have been necessary if they had been written for MAS. With that
caveat in mind, it's not surprising that MAS added a CPU overhead of 5%
to 100% (IIRC). (Also note that the BDW collectors has had several
releases since that test, so rerunning the benchmark today might give
better results.)


Regards,
Joachim
[This would be a fine topic for gclist, another spinoff of comp.compilers.
Send "subscribe" to gclist-request@lists.iecc.com. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.