Re: Compiling to the Intel instruction set

Andi Kleen <>
30 Jun 2004 23:01:28 -0400

          From comp.compilers

Related articles
Compiling to the Intel instruction set (2004-06-06)
Re: Compiling to the Intel instruction set (2004-06-09)
Re: Compiling to the Intel instruction set (2004-06-09)
Re: Compiling to the Intel instruction set (Scott Moore) (2004-06-25)
Re: Compiling to the Intel instruction set (Andi Kleen) (2004-06-30)
| List of all articles for this month |

From: Andi Kleen <>
Newsgroups: comp.compilers
Date: 30 Jun 2004 23:01:28 -0400
Organization: unorganized
References: 04-06-016 04-06-089
Keywords: assembler, code
Posted-Date: 30 Jun 2004 23:01:27 EDT

Scott Moore <> writes:
> I don't agree that 386 is all that non-orthognal. The main thing that
> requires a register is the ecx as count register, and string moves,
> compares, etc, that use esi/edi and ecx for the rep instruction.

Funny is that many RISCs are more non-orthogonal - they require fixed
registers for divisions. The 386 dropped this restriction.

[and their OSes are often far more segmented than modern x86 OS too,
but that's a different chapter]

> Pretty much all such uses are deprecated now, and the software
> optimization manuals state that sequences of standard instructions are
> better.

It depends; rep movsl is still the best memcpy in many situations on
several modern x86 (that's because the microcode can do special
optimizations for it). The other string operations are also quite
small (which allows inlining) and not necessarily slower than a hand
coded function.

> The biggest source of "non-orthogonal" register uses I encounter are
> function calls. Virtually all register calling conventions require

And function returns. And a real compiler usually wants to have
some kind of inline assembly facility, which also normally requires
fixed input/output registers.

> Now there is a lot of complaining about lack of register space on the
> 386. Things are getting better with the x86-64, but there is a lot you
> can do in the meantime. A huge number of numeric references require no
> more than byte registers (think char type in C). The standard method
> in most 386 compilers is to carry all operands as full dwords, which
> is not necessary. A lot of traffic can be kept in byte registers,
> which effectively expands the register space to a maximum of 10 from
> the usual 6 (al, ah, bl, bh, cl, ch, dl, dh, esi, edi). Also, in many

This is usually a bad idea for performance, because the instruction
reordering dependency tracker in modern x86s consider eax/ax/al/ah as
the same register. You will just get a lot of instruction stalls
because actually independent instructions get false dependencies.

> cases, you can forgo using the ebp in preference to direct stack
> offsets, and thus free up the ebp register, making 3 dword registers
> and 8 byte registers, and various combinations thereof.

Disadvantage is that the code gets bigger on average (an ebp reference
is one byte smaller than a reference to esp). It's a win when there
are only a few references to stack variables in the function, but
when there are more the space disadvantage offsets the saving from
not doing the frame pointer setup. Of course on some large functions
having one register more also pays off and it's usually faster.
But overall it's a trade off between code size and performance

> The RISC trick of assigning fixed blocks of registers to the calling
> convention is too expensive for the 386. Ie., in parameters, out
> parameters, function return and scratch. The calling convention, in my

It's widely used by x86 compilers these days (e.g. for static C
functions or even globally with a special switch) and typically
generates faster and smaller code.

Custom adjusting the calling conventions is even better of course, but
just the default register ABI is normally an improvement.

The biggest problem in writing a x86 compiler is probably not all
that, but writing a code generator that generates good floating point
code for the x87 FPU stack. One way to avoid this is to just target
modern (SSE2-capable) x86s and only generate scalar SSE2.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.