Re: Garbage Collection

boehm@parc.xerox.com (Hans Boehm)
Fri, 14 Aug 1992 20:37:33 GMT

          From comp.compilers

Related articles
[18 earlier articles]
Re: Garbage collection usenet@leapheap.co.uk (2004-09-13)
Garbage Collection and Interactive Languages eifrig@beanworld.cs.jhu.edu (1992-08-04)
Re: Garbage Collection eifrig@blaze.cs.jhu.edu (1992-08-09)
Re: Garbage Collection boehm@parc.xerox.com (1992-08-11)
Re: Garbage Collection eifrig@beanworld.cs.jhu.edu (1992-08-12)
Re: Garbage Collection David.Chase@Eng.Sun.COM (1992-08-13)
Re: Garbage Collection boehm@parc.xerox.com (1992-08-14)
Multi-Threading and Garbage Collection eifrig@beanworld.cs.jhu.edu (1992-08-15)
Garbage collection Olin.Shivers@cs.cmu.edu (1992-11-24)
| List of all articles for this month |

Newsgroups: comp.compilers
From: boehm@parc.xerox.com (Hans Boehm)
Organization: Xerox PARC
Date: Fri, 14 Aug 1992 20:37:33 GMT
References: 92-08-015 92-08-056
Keywords: storage, GC

eifrig@beanworld.cs.jhu.edu (Jonathan Eifrig) writes:
[Much text omitted]
> Sure, threads are important. Sure, multiple threads in a single
>process should share the same heap space. All I'm saying is that there is
>no need to use some external "alarm" interrupt to slavishly switch
>contexts immediately upon receipt of such a signal. We can, as I
>explained earlier, switch contexts _only_ at the beginning of basic
>blocks. What's a few more instructions between context switches?


Clearly no problem, assuming you don't need to add frequently executed
code to accomplish this. I don't know how to do this without
single-stepping from the signal handler, which may or may not be easy,
depending on hardware and OS.


>If we decide, in an attempt to be _really_conservative_,
>to assume everything is a pointer, then we pay a double penalty: since
>we're wasting time moving junk around, garbage collection takes longer,
>and since we're not reclaiming storage that is actually free, we collect
>more often.


Note that we can't literally move things around, since relocating integers
to point to tospace is a bad idea.


> It seems clear that this naive approach won't be satisfactory ...


As David points out, it is usually satisfactory, assuming some care in the
implementation of the allocator/collector. In fact, many existing systems
to precisely that, at least for stack references. These include various
versions of Cedar at PARC, Modula-2+ and Modula-3 implementations at DEC
SRC, AKCL, Joel Bartlett's Scheme-to-C implementation, Sather, etc.


There's even a somewhat convincing argument that this should work. Assume
(here's the flakey part) that all values in the stack are either permanent
or change fairly quickly. Those that are valid pointers don't concern us
(*). Those that are short-lived are also OK, since they at worst postpone
collection of some objects a little bit. Empirically, this is a minor
effect, and any generational collector will have the same problem,
probably to a larger extent. The permanent non-pointers are also easy to
deal with; we look for them before we start (and at every later gc), and
refuse to allocate at those adresses.


(There are actually two problems with this, both minor in practice.
Long-lived nonpointers may be created while the program is running.
More importantly, valid, but dead, pointers may not be cleared in a
timely fashion. This is somewhat orthogonal to conservative collection.
In my experience, on an architecture like SPARC, this dominates,
since many sections of the stack are not written frequently.
Hence statement (*) isn't really correct. The effect can be
reduced by explicitly clearing dead stack frames from the allocator,
on occasion.)


> I just don't see how you're going to allow garbage collection to
>occur _anywhere_ and handle any sort of tagged heap. Are you assuming
>that procedure calls and heap allocations are atomic? What happens if I
>have to garbage collect while I'm in the middle of allocating a record,
>and haven't written the length of the record to the heap yet?


There are two ways of dealing with this. With high allocation rates, you
probably need to give each thread its own chunk of memory that it can
allocate from without synchronization. With lower allocation rates,
allocation can acquire a lock to prevent conflicts. The PCR allocator
acquires and releases a monitor lock (perhaps with some minor shortcut)
for each allocation. This isn't free, but the only cost is during
allocation. With Cedar-like (or Modula-3 like or C-like) allocation
frequencies this probably makes sense. It's almost never a bottleneck.
With SML-of-NJ like allocation rates, it's a bad idea.


Hans-J. Boehm
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.