Re: _My_ ideas for current research!

hoelzle@Neon.Stanford.EDU (Urs Hoelzle)
26 Nov 91 01:00:12 GMT

          From comp.compilers

Related articles
Re: Current work in compiler/language design. sverker@sics.se (1991-11-19)
_My_ ideas for current research! eifrig@blaze.cs.jhu.edu (1991-11-22)
Re: _My_ ideas for current research! carlton@scws8.harvard.edu (1991-11-23)
Re: _My_ ideas for current research! hoelzle@Neon.Stanford.EDU (1991-11-26)
| List of all articles for this month |

Newsgroups: comp.compilers
From: hoelzle@Neon.Stanford.EDU (Urs Hoelzle)
Keywords: Lisp, Scheme, performance
Organization: CS Department, Stanford University, California, USA
References: 91-11-072 91-11-108
Date: 26 Nov 91 01:00:12 GMT

carlton@scws8.harvard.edu (david carlton) writes:


>In article 91-11-099, eifrig@blaze.cs.jhu.edu (Jonathan Eifrig) writes:


>Andrew Appel has proven that heap allocation is faster than stack
>allocation under fairly reasonable circumstances, assuming that you have
>enough memory. (The basic idea is that if you allocate things on the
>stack then you have to deallocate dead objects explicitly, whereas if you
>allocate things in the heap and use stop-and-copy garbage collection, you
>only touch live objects, not the dead ones; so if your typical dead
>object/live object ratio is high enough, you are better off allocating
>everything in the heap.) But his and David MacQueen's SML/NJ compiler is
>the only one that I know of that does this.


Whoa! I'm a big fan of (compacting generational) GC, but Appel's paper
doesn't really prove *anything* except that for one particular machine /
program combination, heap allocation was faster. The paper is nice but
has two fatal flaws (IMHO) which prevent generalizations:


1) "if you allocate things on the stack then you have to deallocate
      dead objects explicitly". This is simply not true in general. Any
      reasonable compiler will discard the stack-allocated data together
      with the rest of the stack frame, i.e. using 0 cycles.


2) Architectural effects are completely ignored in the analysis. The
      machine used for the timings was a (slow) VAX, so cache effects
      played no role. In today's RISC systems which have a HUGE speed
      difference between CPU and main memory (dozens of cycles for a
      cache miss), you can't just assume that a store is a store is a
      store.


      With stack allocation, the top of the stack is likely to be in the
      cache, and thus writes and reads to the data are likely to be hits.
      With a very large heap (such as those used by Appel), every initial
      store to a new object will cause a cache miss since the heap is
      much bigger than the processor cache.


In summary, one can make a good argument a generational copying GC can be
made reasonably efficient by increasing the size of the creation space.
However, heap allocation is not free, and allocating procedure activations
on the heap is almost always a VERY bad idea if you want your system to
run reasonably fast (within a small factor of optimized C).


Some back-of-the-envelope calculations to support this: assume a Sun SS-1,
write miss = 26 cy/line, 1 line = 4 words. 500K calls/s (one call every
50 instructions) and avg. activation record size of 8 words --> 5e5 * (8 /
4) = 1e6 misses / s = 26 M cycles/s lost to cache misses. Since a SS-1
runs at 25MHz, you lose more cycles per second than you have available,
i.e. your program runs at most half as fast as the optimized C program.
[The 500K calls/s figure is taken from several C benchmarks, i.e. I
believe it is not unreasonably high...also, if I remember correctly, many
studies show calls to be around 2% of the dynamically-executed
instructions.]


Of course, if your system runs 10 times slower tha C anyway, you're doing
only 50K calls/s and the relative overhead drops to 10% which is fine
considering the simplification that heap-allocation probably brings. But
don't argue that heap allocation makes it go faster...


-Urs


----------------------------------------------------------------------------
Urs Hoelzle hoelzle@cs.stanford.EDU
Computer Systems Laboratory, CIS 57, Stanford University, Stanford, CA 94305
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.