Re: SPARC code generation references

torek@elf.ee.lbl.gov (Chris Torek)
Sun, 26 May 91 21:45:29 GMT

          From comp.compilers

Related articles
SPARC code generation references ressler@cs.cornell.edu (1991-05-24)
Re: SPARC code generation references torek@elf.ee.lbl.gov (1991-05-26)
Re: SPARC code generation references preston@ariel.rice.edu (1991-05-28)
Re: SPARC code generation references pardo@june.cs.washington.edu (1991-05-28)
Re: SPARC code generation references array!colin (1991-05-27)
| List of all articles for this month |

Newsgroups: comp.compilers
From: torek@elf.ee.lbl.gov (Chris Torek)
Keywords: SPARC, optimize
Organization: Lawrence Berkeley Laboratory, Berkeley
References: 91-05-100
Date: Sun, 26 May 91 21:45:29 GMT

[Apologies for all the quoting, but I found it difficult to edit this
down any further.]


In article 91-05-100 ressler@cs.cornell.edu (Gene Ressler)
quotes pardo@cs.washington.edu (David Keppel):
>Non-leaf procedures can be optimized if non-leaf behavior is uncommon. That
>is,
>
> int foo(int *a, int *n) {
> if (*n == 0) reload(a, BUFSIZE);
> --*n;
> return (a[*n]);
> }
>
>is compiled by all compilers I know of as:
>
> save
> test
> branch
> call
> norefill:
> sub
> index
> ret
> restore
>
>or some such. Faster on average is:
>
> test
> branch
> save // moved
> call
> restore // moved
> norefill:
> sub
> ret
> index
>
>but harder to determine when it is profitable.


Oddly enough, a day or two before this appeared I was talking to John Gilmore
and Sean Fagan about this very issue. The answer is that this is profitable
more often than might appear at first, and the technique can be (and I claim
should be) applied to every function. The general idea is this:


Move save and restore instructions `inward', renumbering
registers, until they `bump into' call instructions or until
the registers do not fit in the free %o set. (Then move the
restore outward to fit in a delay slot, if necessary.) If
there are no subcalls, or a variable is not live across a
subcall, %g temporaries may be used as well to reduce the
register pressure.


This technique might be called `leaf crushing' (I wanted to call it `leaf
burning' but that was a bit too much of a stretch :-) ).


The trick is that the input parameters to each function show up in `%o0'
through `%o5', which `change over' to `%i0' through `%i5' when a SAVE
instruction is done. The change is done inside the CPU by renumbering the
registers (changing the register window pointer) and can be done the same way
in the compiler.


You might think that you would often run into register allocation problems,
but the `not live across subcall' condition helps out quite a bit.


Let us use the original function as an example. The obvious naive expansion
of


int foo(int *a, int *n) {
if (*n == 0)
reload(a, BUFSIZE);
--*n;
return (a[*n]);
}


is


_foo:
save ! get a new window
ld [%i1], %o0 ! *n
tst %o0 ! 0?
bnz L1 ! no, skip call
nop ! nothing to do in delay slot
mov %i0, %o0 ! first parameter for reload()
call _reload ! reload(a, BUFSIZ)
mov 1024, %o1 ! second parameter for reload()
L1:
ld [%i1], %o0 ! *n
dec %o0 ! - 1
st %o0, [%i1] ! fix *n
! ld [%i1], %o0 ! (only if terribly stupid)
sll %o0, 2, %o0 ! * 4
add %i0, %o0, %o0 ! a + *n
ld [%o0], %o0 ! return value
ret ! jump back to caller
restore ! but first go back to previous window


Some fairly simple alias analysis should reduce this to:


_foo:
save
ld [%i1], %o0 ! *n
tst %o0 ! 0?
bnz L1 ! no, skip call
nop
mov %i0, %o0 ! as before
call _reload
mov 1024, %o1
ld [%i1], %o0 ! *n may have changed
L1:
dec %o0 ! compute *n - 1
st %o0, [%i1] ! update *n
sll %o0, 2, %o0 ! as before
add %i0, %o0, %o0
ld [%o0],
ret
restore


but this is not all that much better. (The obvious delay-slot fill, copying
the `dec %o0' up and using an annulling branch, and moving L1 down one line,
can be done later.)


Now let us try `leaf-crushing':


int foo(int *a, int *n) {
if (*n == 0)
---------------------------------------
reload(a, BUFSIZE);
[[note: *n is unknown]]
---------------------------------------
--*n;
return (a[*n]);
}


Here I have marked the last `no subroutine calls' line and the first `end of
all subroutine calls' lines. These mark the barriers for `easy' moving of
save/restore. I have also noted where the value of *n becomes unknown and
must be reloaded (typically, in C at least, compilers assume that anything
pointed-to is unknown after any subroutine call, so this is not an unusual
case). If we can estimate register needs, we can see that, before and after
the call, all we need are the two parameters `a' and `n', and a temporary to
hold *n; this easily fits in the leaf set. So now we generate this:


_foo:
ld [%o1], %o2 ! *n
tst %o2 ! 0?
bnz L1 ! no, skip call
nop
save ! not a leaf
---> mov %i0, %o0 ! first parameter for reload()
call _reload ! etc.
mov 1024, %o1
---> ld [%i1], %i2 ! *n may have changed
restore
L1:
dec %o2 ! compute *n - 1
st %o2, [%o1] ! update *n
sll %o2, 2, %o2 ! as before
add %o0, %o2, %o2
ld [%o2], %o0
retl ! leaf return
nop


The lines marked with arrows ---> show the register renumbering effect.
Within the `non-leaf' part of the function, everything that was called `%o'
is called `%i' instead. Other temporaries that do not die across the `save'
or `restore' boundaries may be kept in %g registers as well; these are not
renumbered, but do die across the calls themselves (unlike the %o/%i
temporaries---this makes %o/%i temporaries `more valuable').


If we do the usual delay-slot filling on the last example, we get just what
is desired:


_foo:
ld [%o1], %o2
tst %o2 ! if *n != 0 ...
bnz,a L1 ! compute *n - 1 and skip call
dec %o2
save
mov %i0, %o0
call _reload
mov 1024, %o1
ld [%i1], %i2 ! update register holding *n
restore
dec %o2 ! compute *n - 1
L1:
st %o2, [%o1] ! update *n
sll %o2, 2, %o2
add %o0, %o2, %o2 ! compute &a[n]
retl ! leaf return
ld [%o2], %o0 ! value = a[n]


As long as (6 - n_parameters [%o registers] + 7 [%g registers]) is enough
registers, `save' and `restore' instructions can be moved inward at no cost
whatsoever---it may not help, but it will not hurt. David Keppel is,
however, correct in pointing out that once these registers run out, or if the
`save' and `restore' instructions are to be duplicated (see below), the
profitability of moving the save and restore inward is murkier.


The typical case that does not benefit here, but might from a more
sophisticated technique, is code of the form:


hard(parms) {
if (something)
a();
compute;
if (somethingelse)
b();
}


where the `something' and `somethingelse' are rare. It would be possible for
a programmer with a leaf-crushing compiler to rewrite this as:


uglier(parms) {
if (!something && !somethingelse) {
compute;
return;
}
if (something)
a();
compute;
if (somethingelse)
b();
}


and in fact this sort of duplication is useful if the `something's do
not change across an inner loop. Common-code-path-analyzing compilers
(such as MIPS', using pixie feedback) might do this automatically
anyway; if this is done before leaf-crushing, functions like `hard'
above will be properly optimized.
--
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA Domain: torek@ee.lbl.gov
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.