Re: Nitty-gritty aspects of register allocation

"Alexei A. Frounze" <>
Thu, 10 Sep 2020 18:01:30 -0700 (PDT)

          From comp.compilers

Related articles
Nitty-gritty aspects of register allocation (Elijah Stone) (2020-09-10)
Re: Nitty-gritty aspects of register allocation (Alexei A. Frounze) (2020-09-10)
Re: Nitty-gritty aspects of register allocation (Bo Persson) (2020-09-11)
Re: Nitty-gritty aspects of register allocation (2020-09-11)
| List of all articles for this month |

From: "Alexei A. Frounze" <>
Newsgroups: comp.compilers
Date: Thu, 10 Sep 2020 18:01:30 -0700 (PDT)
Organization: Compilers Central
References: 20-09-028
Injection-Info:; posting-host=""; logging-data="75824"; mail-complaints-to=""
Keywords: optimize, registers
Posted-Date: 10 Sep 2020 21:29:36 EDT
In-Reply-To: 20-09-028

On Thursday, September 10, 2020 at 5:14:12 PM UTC-7, Elijah Stone wrote:
> There is a lot of research and a lot of resources on the high-level
> aspects of register allocation--colouring interference graphs, lifetimes,
> CFGs, etc.
> But there are a lot of low-level architecture-specific things that these
> models don't account for on their own. Most notably, not all registers
> can be used for the same things. Just on amd64:
> - Multiplication and division use the rdx:rax register pair.

There are 2- and 3-operand variants of the imul instruction,
which are not restricted to using *DX:*AX.
You can use imul instead of mul if you don't need to get the
product that is twice as wide as any of the multiplicands.

> - Bit shifts always use cl.

Not always. When you need to shift by a known constant
amount, you don't need to place the amount in CL.

> - When calling functions, their arguments have to go in specific registers.

True for a few of the first arguments. The rest end up on the stack.

> In all of these cases, you can spill whatever happens to already be in
> those registers, but it would be nicer if you could arrange for the
> appropriate values to already be in the right places.

Certainly. I've given some thought to this problem in the
context of the i80386 and earlier, where there are more
restrictions, and here's what I think...

- If you can transform a multiplication or a division into
a shift by a constant (or shift + add) or the LEA
instruction, do so.

- Use 2- or 3-operand imul instead of mul whenever possible,
so you don't need to involve the fixed *DX:*AX.

- Widening multiplication should be rare. It should be OK
to spill *DX:*AX.

- Divisions are costly and should be rare. It should be OK
to spill *DX:*AX.

- If you need a shift by an unknown amount, spill *CX if
needed. Or see if you can reorder what you're doing, so
you can avoid spilling *CX.

The rest of the regular instructions (ADD, ADDC, SUB, SBB,
SETcc, Jcc, JMP, CALL, RET) are not tied to specific registers
and you can choose any registers for them.

> But how? A naive
> solution to the first two problems just makes rax/rcx/rdx the
> lowest-priority registers except when doing a multiply/divide/shift; and
> spills only if necessary (rare). But that's not a general solution
> (imagine if every register has a few pieces of unique functionality), and
> function calls reserve too many registers for that strategy to be
> practical in that case.

I don't have a simple, optimal and general solution.
But unless it's somehow critical, I would not try to
solve this problem optimally.

> Bonus microconsiderations (these seem much easier to model, but still not
> trivial):
> - Some registers need a special prefix to be used (REX prefix). These
> registers are generally different from the special-purpose registers
> (for e.g. multiplication). Is it better to put a non-multiplied value
> in a REX-prefixed register, or keep it in an unprefixed register and
> spill it later when you need to multiply?

There may be a very small decoding penalty for REX prefixes
in instructions that use R8-R15. There will be a normal
code-size penalty with regards to using longer instructions.
I would not bother about this too much, just prefer "R0"-"R7"
to R8-R15 when possible.

> - The second-lowest 8 bits of some registers can be addressed
> separately. When does it make sense to use them?

AFAIR, on modern CPUs there are penalties in using subregisters.
See e.g. Agner Fog's optimization manuals for details.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.