The quest for the optimal calling convention; a model for specifying multiple calling conventions (longish)

Antti Sykari <>
12 Jan 2003 18:02:35 -0500

          From comp.compilers

Related articles
The quest for the optimal calling convention; a model for specifying m (Antti Sykari) (2003-01-12)
Re: The quest for the optimal calling convention; a model for specif (2003-01-17)
Re: The quest for the optimal calling convention; a model for specifyi (Antti Sykari) (2003-01-25)
Re: The quest for the optimal calling convention; a model for specifyi (John R. Strohm) (2003-01-26)
Re: The quest for the optimal calling convention; a model for specifyi (Dmitry A.Kazakov) (2003-01-27)
Re: The quest for the optimal calling convention; a model for specifyi (Rodney M. Bates) (2003-01-27)
Re: The quest for the optimal calling convention; a model for specif (John R. Strohm) (2003-01-29)
[3 later articles]
| List of all articles for this month |

From: Antti Sykari <>
Newsgroups: comp.compilers
Date: 12 Jan 2003 18:02:35 -0500
Organization: Compilers Central
Keywords: design, optimize
Posted-Date: 12 Jan 2003 18:02:35 EST


I'm working on a hobby language design project. For now, the main
objective is to create a C/C++-like language as efficient as possible.
The spirit of the language is to allow the programmer to get as near
to "bare metal" as possible. This includes, of course, pointers, but
also want access to the stack in a controlled and hopefully portable
manner. I believe that the language needs a firm and well-designed
low-level foundation before a real type-safe, object-oriented,
bells-and-whistles language can be constructed on it.

Currently I'm working on calling conventions. First of all, I plan to
make all arguments read-only by default, so that the decision of
passing variables by reference or by value is transparent to the user
and we will be saved from C++-like "const String&" -style cluttering
of the argument lists. Now, there are plenty of ways to deliver the
arguments to the callee, and plenty of ways to return result value(s)
to the caller.

Consider a simple function:

int f(int x, int y)
        return x + y;

It is possible to pass the arguments
- on the stack
- in predefined registers
- by reference, passing the address of a variable either in the stack or
    in a predefined register

And, similarly, the result can be returned
- on the stack
- in a predefined register
- by reference (the caller stores the address of the return value in the
    stack or a register, and f() stores the return value into the desired

I came up with a scheme for modeling calling conventions so that a
function can have several implementations with differing calling
conventions. So the compiler can decide which calling convention to
use, or even generate a version of the function on demand if a
customized calling convention is desired.

The scheme is described below, and while we're at it, I'd be glad to
receive any feedback on it.

But nevertheless, the main problem is this:

It is possible, given the body of the function and optionally some
contexts in which the function is called, select a close-to-optimal
calling convention for a function and use it when calling a *known*
function. But we cannot know anything about an *unknown* function
which is called through a function pointer or a virtual function call
(in an object-oriented context). So, how to determine the optimal
calling convention for a function, given only its signature?

Two observations:

First observation is that of course there are no optimal calling
convention for every situation in which the function is used, and for
every possible function that is going to be called. But we should be
able to determine a heuristic which would give the best possible
performance averaged over all possible functions.

The second observation is that the answer depends on the machine
architecture. For example, on Sparc, using the register stack would
be a strong candidate. But let's suppose we're looking at x86
architecture, which demands our attention because of its popularity.
Are there any reasonable heuristics to produce a usable calling
convention based on the function signature, using the parameter
passing methods described above?

I'd imagine something like "Reserve a suitable amount of registers for
variables, pass big-enough variables by reference and the rest on the
stack". And the rest of the work is just fine-tuning this heuristic.

Anyway, any pointers to previous work on the subject and helpful
opinions would be greatly appreciated.

-- Antti

*** A scheme for modeling calling conventions in a C-like language ***

For example, on x86 architecture, the standard C calling conventions
dictate that the arguments be passed on the stack, and the caller cleans
the stack, because variable-length argument lists have to be supported.
Often it would be more efficient to pass them in registers. Microsoft
has __fastcall calling convention, which passes as many arguments as
possible in registers, as well as __stdcall which leaves cleaning up the
stack as the callee's responsibility. But one still cannot decide in
*which* registers the arguments are passed, or decide on a by-argument
basis how it should be passed.

On the other hand, in C and in a greater degree in C++, returning a
value of a class type might result in a potentially expensive copy
operation, even a call of a copy constructor, so we would like to
construct the return value directly where it belongs. Furthermore,
returning the result into given memory address can help tail recursion
optimization, as argued in paper by Bigot and Debray [1].

This leads into the conclusion that the current schemes are suboptimal
because programmer cannot specify a custom calling convention even if he
knew what would be the optimal convention for a given function. So I
came up with the following scheme.

Assume for modeling purposes that general-purpose registers of size
sizeof(int) live at an array,

        int regs[numRegs];

and that we can access the stack frame via pointer

        int *fp;

so that fp[0] is first argument in the C calling convention, fp[1] the
next and so on. (Regardless of the real value of the frame pointer
register, which might contain some offset - if one is used at all.)

Now a function f with return argument of type Ret and arguments of types
Arg1, ..., ArgN can be, in all generality, modeled as:

_f(Ret *result, Arg1 *arg1, ..., ArgN *argN);

For example, the function int f(int, int) defined above would be seen
by the compiler as:

/* Template for f
_f(int *result, int *x, int *y)
        *result = *x + *y;

Calling _f, therefore, is achieved as follows, supposing that f is
called like <expr1> = f(<expr2>, <expr3>);

/* Calling sequence template for f */
        evaluate &<expr2>; put it into x;
        evaluate &<expr3>; put it into y;
        evaluate &<expr1>; put it into result;
        call _f

This version does not look particularly efficient, but is does not
reflect the actual calling sequence, but is rather a template for it.

In reality, the compiler (or the programmer) makes available one or
several alternatives for _f, where all of its arguments are fixed to one
of the following:

- &fp[i]: pass (or return) the paramater in stack frame position 'i'
- &regs[i]: pass (or return) the parameter in register 'i'
- fp[i]: pass the parameter's address in stack frame position 'i'
- regs[i]: pass the parameter's address in register 'i'

Note that modeling registers as an array allows also for using several
consecutive registers to contain a bigger object than just an integer.

In addition, several return values are trivially implemented using this

The list of the different versions of f() will be part of f's interface,
so that the compiler can pick the best f() for each situation.

An example follows; we fix parameters with a hypothetical '@' operator.

/* Implementation of f, version 1.
  * Provide a version with argument passing rules:
  * Pass x, y in the stack,
  * Return result in register 0 */

_f(int *result @ &regs[0], int *x @ &fp[0], int *y @ &fp[1]) {
        *result = *x + *y;
        /* transformed into:
              regs[0] = fp[0] + fp[1]; */

/* Implementation of f, version 2.
  * Provide a version with argument passing rules:
  * Pass x, y in registers 0 and 1
  * Return result in memory address pointed to by register 0 */

_f(int *result @ regs[0], int *x @ &regs[0], int *y @ &regs[1]) {
        *result = *x + *y;
        /* transformed into:
              *(int *)regs[0] = regs[0] + regs[1]; */

/* Implementation of f, version 3.
  * Provide a version with argument passing rules:
  * Pass and return everything by reference; put the addresses of the
  * arguments in the stack and the address of the return value in
  * register 0 */

_f(int *result @ regs[0], int *x @ fp[0], int *y @ fp[1]) {
        *result = *x + *y;
        /* transformed into:
              *(int *)regs[0] = *fp[0] + *fp[1]; */

Now, the compiler should be able to pick the most efficient version of f
by some kind of simple heuristic, for example:

if we want to place the result into memory:
        if registers 0 and 1 are free
                use version 2
                use version 3
        use version 1

The calling sequence would be similar derivative from the calling
sequence template as the versions of _f are from the template for _f.

Further note on this scheme is that it currently doesn't include the
notion of caller-saved/callee-saved registers. This could maybe be
implemented by allocating registers, within the function prototype, for
dummy variables. At least if the default register policy for a function
would be "don't destroy your passed-in arguments", then incoming
register dummies would denote callee-save, and outgoing register dummies
would denote caller-save registers. I wonder what to do if the in and
out registers overlap...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.