Re: Stack and calling convention questions

rkrayhawk@aol.com (RKRayhawk)
7 Aug 1999 01:52:31 -0400

From comp.compilers

Related articles
Stack and calling convention questions olsonr@panix.com (Thelonious Georgia) (1999-08-01)
*Re: Stack and calling convention questions rkrayhawk@aol.com* (1999-08-07)**
Re: Stack and calling convention questions mikov@usa.net (Tzvetan Mikov) (1999-08-12)

| List of all articles for this month |

From:	rkrayhawk@aol.com (RKRayhawk)
Newsgroups:	comp.compilers
Date:	7 Aug 1999 01:52:31 -0400
Organization:	AOL http://www.aol.com
References:	99-08-011
Keywords:	architecture, practice

Perhaps these general comments may help you in your continued studies
of the assembler instructions generated by you compiler under
optimization.

Fast calls generally exploit some built-in capability of the specific
processor.

An easy example of this is the need, upon return from a subroutine, to
reduce the Stack Pointer by an amount equal to the count of parameters
pushed on the the stack way back when the call was setup. A simple ADD
can do that (stacks work backwards, add to reduce the stack). Setting
up the stack parms before the call and cleaning up after is normally
the responsibility of the caller.

On Intel chips you can shift the responsibility for the stack cleanup
to the subroutine in the 'RET n' return command. This is usually
quicker as one instruction rather than two. It also places the
quantity to reduce the stack pointer in exactly one place (which can
be more reliable).

Fast calls can sometimes mean enregistering the return value, rather
than placing it in a temporary storage location.

The XOR instruction streamed in after the return from a subroutine can
have several purposes. This kind of thing is very architecture
specific. First, house cleaning, the Carry Flag and Overflow Flag are
cleared. That could be useful for down stream code. (That is even
though it cost an instruction now, it could save an instruction
later).

But more importantly the XOR instruction is testing the EAX register.
It is a nearly universal convention that EAX is used as the result
register of simple scalars in C subroutine interfaces. (Not
always. Sometimes larger entities must land in EAX/EDX.) So the XOR is
testing the result.

The XOR instruction will set the PF, SF and ZF. In effect the meaning
of the contents of EAX is transfered to the status register, where it
can subsequently be used for branches.

The code you posted does not reveal any source code level test of the
result from the subroutine call, so the XOR is being emitted by the
compiler either for the house cleaning purposes, or as a brute force
method to assure that any need to interrogate the result can be done
efficiently. ((Or else, your XOR EAX,EAX was extracted from a more
complex compilation)).

When do you want fast calls? The number one rule is that you want
caller and callee to agree. It is not possible to emphasize that too
much.

In tight corners fast calls can reduce code size. For example, things
like ADDs to the SP in many places after calls to a single subroutine
can be eliminated by the one cleanup factor within the 'RET n'
instruction.

Returning simple scalar results in registers is the norm. It is hard
to turn it on or off, but some fast compile options can return pairs
of values in registers (this may not be responsive to your specific
compiler, it is a concept for consideration). If a value never needs
to go to storage as a temporary, it can save STORE/LOAD time and
possibly memory space. ((It is worth emphasizing that the storage
space savings is only 'possible'. The dirth of registers on the Intel
chips generally means that anything saved in a register, chases
something else out, necessitating space for the other item.))

Incidentally, it is also possible for fast calls to enregister simple
input parameters. That is, to not place them on the stack at all, but
pass them into the subroutine in registers.

In your studies, do not be offended if you see the code doing
seemingly mindless things by habit. For example, some compilers will
do certain clean up after a call, which has nothing to do with the
call, the subroutines visible attributes or specific local intentions
with the returned values. Instead, a compiler may emit code that
copes with it's own possible reliance upon a clean status register (or
the like). It's on-going assumption may be disturbed by unknown events
in any called subroutine. So it might just cleanup certain things
always upon return.

This kind of thing can become visible in an optimized code
sequence. Certain bundles of code might possibly want to set a status
bit and then test it. It is not unusual to emit a clearing instruction
first. But if the compiler writer is confident enough, then the
pre-clear can be 'known or assumed' to be not really necessary, BASED
UPON THE CODE SO FAR GENERATED IN THE CURRENT ROUTINE. So, to be safe
emit the pre-clear, to be efficient in time and code space do not emit
the pre-clear under user expressed compile option. A call to a
subroutine can disturb the on-going asumptions. So a house cleaning
instructions can sometimes be read as 'reset my assumptions'.

You will see two pattern then in generated code. Unoptimized code will
bullet proof all bundles of code that need a pre-clear to be safe; yet
upon return from a subroutine no clear is needed since every down
stream bundle will have it's own protection. In optimized code, the
compiler can track whether it disturbs on-going assumptions, and only
emit reset-clear logic after each disturbance, leaving must bundles
without redundant pre-clears; yet the calls have to be followed with
reset-clear instructions (a seeming contradiction to optimization, but
read them as 'reset my assumptions').

You can generalize this notion beyond the status register of Intel
devices. Some compilers store status information in global areas (not
on a stack). These items sometimes need to be reset or updated upon
return from a subroutine, because that routine may have monkeyed with
the status information stored globally. For this reason anything that
engages the run-time typing facilities of a compiler can cause copius
reset logic upon return to your caller. And even simple debug options
can induce reset code to be emitted.

Studying the assembler code is a great way to learn about the machine
and the compiler. But you will see occassional distractions,
especially when debug, run-time typing and/or exception handling is
engaged. Also note that various optimizations will not occur under
some compiler option combinations. You do have to dig into the
documentation to discover these issues sometimes (may not get warning
messages about self defeating option combinations).

Keep in mind that compilers actually are machines, and they are
mechanical and repetitive. Optimization is a general effect. Don't get
disappointed if some code does not seem too perfect.

One other minor comment about the XOR instruction. You are
intelligently studying simple examples. But sometimes this can be a
little misleading. For example, the XOR after a call to foo() in the
sequence,

x = foo(y);
return 0;

could be a preparation for the 'return 0'. In otherwords, since
nothing happens after the function call, it is already time to get
ready to leave main(). It is exactly optimization that can make this
look confusing because certain instructions (relating to the
subsequent return()) may be lifted up past completion of the task of
assigning the result to x (to exploit parallelism on the Pentium, for
example).

Intel sites have a great deal of information about code
optimization. Email me if you need help finding Intel technical sites.

Hope that is not too general.

Best Wishes,
Bob Rayhawk
RKRayhawk@aol.com

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Stack and calling convention questions

rkrayhawk@aol.com (RKRayhawk)7 Aug 1999 01:52:31 -0400

rkrayhawk@aol.com (RKRayhawk)
7 Aug 1999 01:52:31 -0400