Re: Caller allocates space for callee-save registers

pardo@cs.washington.edu (David Keppel)
Wed, 27 May 1992 23:49:02 GMT

          From comp.compilers

Related articles
Caller allocates space for callee-save registers pardo@cs.washington.edu (1992-05-21)
Re: Caller allocates space for callee-save registers pardo@cs.washington.edu (1992-05-27)
Re: Caller allocates space for callee-save registers gaynor@brushfire.rutgers.edu (1992-05-29)
Re: Caller allocates space for callee-save registers henry@zoo.toronto.edu (1992-05-29)
Re: Caller allocates space for callee-save registers andrew@rentec.com (1992-05-31)
Re: Caller allocates space for callee-save registers juul@diku.dk (1992-06-01)
Re: Caller allocates space for callee-save registers andrew@rentec.com (1992-06-01)
Re: Caller allocates space for callee-save registers stephen@estragon.uchicago.edu (1992-06-01)
[12 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: pardo@cs.washington.edu (David Keppel)
Keywords: registers, optimize, summary
Organization: Computer Science & Engineering, U. of Washington, Seattle
References: 92-05-123
Date: Wed, 27 May 1992 23:49:02 GMT

In 92-05-123 I asked ``why does the caller allocate
save space for parameter registers?'' Here is the promised summary of
responses, thanks to:


ressler@cs.cornell.edu (Gene Ressler)
dehnert@baalbek.asd.sgi.com (Jim Dehnert)
cliffc@cs.rice.edu (Cliff Click)
davgot@davgot.Auto-trol.COM (David Gottner)
dehnert@baalbek.asd.sgi.com (Jim Dehnert)
Dirk Grunwald <grunwald@foobar.cs.Colorado.EDU>
"Samuel S. Paik" <d65y@crux1.cit.cornell.edu>
bliss@csrd.uiuc.edu (Brian Bliss)
pbk@arkesden.Eng.Sun.COM (Peter B. Kessler)
Richard Tuck


The principal reasons people gave are:


  - Simplicity: there's no doubt where to save the argument
      registers
  - So the callee will have something to do while stack
      adjusts are in progress.
  - So leaf functions don't have to allocate stack space
  - For varargs


* A summary of the various explanations


** Simplicity


The given model is straightforward. The overhead is only a small overhead
per call frame. It is clear where to put the formals, without any
complications.


** Parallelism


Many functions will save at least some registers; having a preallocated
save area means that e.g., a superscalar machine can find useful work to
do while the stack pointer is adjusted.


** Leaf Functions


Leaf functions are typically small. On the SPARC, they are also typically
register-starved, since in and local registers (16 registers total) are
callee-save, only one global register is generally available (%g1), %o7
has the function's return pc, %o6 has the stack pointer, and %o0-%o5 may
be in use for passing parameters. Even for a function with only one
parameter, there are only 6 free registers, so a few stack slots may be
useful. Allocating those stack slots, only two instructions, may
nonetheless be a significant part of the cost of a leaf function.


** Varargs


The semantics of varargs make it hard to do varargs w/o having all
parameters stored in contiguous memory. A typical varargs function starts
by storing the register arguments in to the reserved stack area so that
e.g.,


    printf (char const *fmt, ...)


can call


    va_printf (char const *fmt, void *arglist)


where `arglist' is a pointer to a contiguous block of memory that holds
each parameter after `fmt'.


The preallocated register save area is a way to ensure that the parameters
can be stored contiguously without any copying of the parameters already
in memory.


I hear second-hand that varargs was the initial motivation for the SPARC.


* My comments on each


<I will take the tone of arguing against the policy of preallocating the
space, even though I'm not really against it. Arguing is just my way of
scrutinizing.>


** Simplicity


The overhead of 32 or 48 bytes per frame is small if the call nesting is
small (which it generally is for C code) and if the machine has a large
cache. For heavily recursive code, the overhead may be large. For a
machine with a small cache, fragmentation may be a problem.


The argument registers are caller-save, so in the usual case (no varargs)
the code generator already has a policy for saving and restoring
registers. Using the preallocated space in the usual case complicates the
code for the usual case since it now has to manage two register save
areas. (It could be argued that argument registers are in a different
class because they come preloaded; I think it depends on your code
generator model which is more complicated.)


In the unusual (varargs) case, there are cheap ways to make contiguous
space for the parameters and otherwise the code will be the same whether
or not you preallocate the space.


** Parallelism


There will generally be some free registers (esp. on the MIPS) that can be
used for computations while the stack is adjusted. Further, if the stack
adjust is large enough that it takes several instructions (a stack frame
larger than ~4K on the SPARC, ~32K on the MIPS), odds are that the
function will take a long time to execute. Thus, a tiny bit of lost
parallelism at the boundaries will probably be unnoticed. Conterexamples
(sparse uninitialized use of large arrays) are possible but statistically
irrelevant in today's code. Finally, if it's for parallelism, the save
area size is probably independent of the number of argument registers.


** Leaf Functions


On the MIPS, there are a substantial number of free (caller-save)
registers, so routines that need to allocate a save area for a few more
arguments will probably be allocating more than just 4 slots. Even if
they aren't, the function must be nontrivial if it's big enough to need to
save registers. The cost of alloating the space should be small.


On the SPARC, the general argument I've heard advanced is ``if you haven't
got enough free registers, do a `save' to allocate 16 available registers
quickly.'' Preallocating 4 slots for leaf functions goes against this
philosophy. It also requires complex and machine-dependent cost analysis
to determine when the registers should be saved to stack slots and when
`save' and `restore' should be used.


** Varargs


It isn't clear to me what you're supposed to do for either machine if
parameters are passed in floating-point registers.


Suppose that the calling conventions didn't preallocate the argument
register save areas. How would you do varargs on each machine? The
simple answer is that the callee, instead of the caller, ensures a
contiguous parameter passing/save area. The callee only needs to perform
the allocation if it's a varargs function.


  - MIPS


In the existing convention, there are 4 blank words on the top of the
stack. Another calling convention could emulate the current one by simply
subtracting an extra 4 words when it does its stack adjust. The extra 4
words could be added blindly if the compiler can't figure out whether a
function is varargs or not. If the compiler can tell (the more likely
case, IMHO) then the 4 words are only added for varargs functions.


In either case, the space allocation is free unless it's a varargs leaf
routine that has plenty of registers, in which case the cost is an
additional 2 instructions.


  - SPARC


The stack top has 16 reserved words for a trap/interrupt handler to save
register windows. Normal function prlogue and epiloges are:


_function:
save %sp, framesize -> %sp
...
ret
restore


The `save' maps the current `out' register (%o0..%o7) to `in' registers
(%i0..%i7) and frees the 16 local and out registers. %o6 is the stack
pointer before the remap; %i6 is the same value (the old stack pointer)
after the remap, and points to the place for the handler to save the in
and local registers.


Once the `save' instruction has been executed, it's too late to allocate
the varargs words, since any time after the `save' (even before the next
instruction executes) the system may asynchronously save the old window's
contents to the space pointed to by %i6. Thus, a new varargs function
would use


_function:
sub %sp, 6 words -> %sp
save %sp, framesize -> %sp
...
restore
retl
add %sp, 6 words -> %sp


Functions that return structures have a pointer to the return value area.
In the current convention, that's passed as an additional argument on the
stack. An alternative calling convention could pass that in a register.
Alternatively, the prologue code could move the pointer, at a cost of one
load and one store.


* Other Comments


Several people asked my interest. The short answer is: I'm just curious
why the space is there. Now that I have an idea I'm interested in what
are the alternatives.


I have it ``on good authority'' that varargs was the motivation for the
argument register save area on the SPARC. I can't speak for the MIPS and
I don't know what other RISC processors do.


Peter Kessler points out that on the SPARC, a modest nesting level of
function invocations wastes only 0.5-1K of space On the MIPS, which
allocates fewer argument slots, the wated space is even less. For
machines with large caches, the wasted space is probably irrelevant. For
machines with small caches (including small first-level caches),
fragmentation may be a problem. Peter also wonders aloud if anybody has
done recent studies on stack usage. The studies used in designing the
SPARC showed very modest call nesting. I've heard it said that some LISP
compilers avoid register windows because the overhead of using them is too
large. The same compilers might also wind up wasting a lot of stack space
using the standard convention.


It could be argued that on the SPARC the varargs overhead is enough to
warrant caller-allocated argument register the save area. That doesn't
explain why they did it on the MIPS, since avoiding preallocation
basically never costs you anything.


All IMHO, of course.


;-D on ( Anti-allocation ) Pardo
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.