Re: Green Compiler ?

glen herrmannsfeldt <>
Wed, 2 Jan 2013 19:52:00 +0000 (UTC)

          From comp.compilers

Related articles
[16 earlier articles]
Re: Green Compiler ? (Dmitry A. Kazakov) (2012-12-30)
Re: Green Compiler ? (George Neuner) (2012-12-31)
Re: Green Compiler ? (Jonathan Thornburg) (2013-01-02)
Re: Green Compiler ? (glen herrmannsfeldt) (2013-01-02)
Re: Green Compiler ? (Hans-Peter Diettrich) (2013-01-02)
Re: Green Compiler ? (glen herrmannsfeldt) (2013-01-02)
Re: Green Compiler ? (glen herrmannsfeldt) (2013-01-02)
Re: Green Compiler ? (Charles Richmond) (2013-01-04)
| List of all articles for this month |

From: glen herrmannsfeldt <>
Newsgroups: comp.compilers
Date: Wed, 2 Jan 2013 19:52:00 +0000 (UTC)
Organization: NNTP Server
References: 12-12-010 12-12-012 12-12-022 12-12-028 12-12-034 12-12-037 13-01-002 13-01-005
Keywords: performance, architecture
Posted-Date: 03 Jan 2013 15:15:45 EST

Hans-Peter Diettrich <> wrote:

(snip, someone wrote)
>> Smaller transistors have more leakage.

> ACK (tunneling effect).


> The actual use of registers depends on the control flow taken
> *actually*. When a subroutine is optimized for using all available
> registers, it has to save and restore the registers on entry/exit.
> When it actually does nothing, due to given conditions, the time and
> energy used for pushing/popping the registers is only wasted.

Some use caller saves, some callee saves. In the latter case, the
subroutine knows which registers it will change, and only needs to
save those. It is usual to do that save on entry and restore just
before return, but with a little extra logic the save might be delayed
a little bit.

> For that reason some (Texas Instruments?) processors implemented a
> register stack, decades ago, with its stack pointer adjusted according
> to the number of registers used in a subroutine. This stack could be
> moved into the CPU nowadays, eliminating the need for saving registers
> in external memory.

If I remember the TMS9900, the registers were in memory, so that
pointer just changed where in memory they were.

> But this optimization reaches a hard limit on
> deeply nested calls, with every subroutine using a high number of
> registers. In external memory the register-stack size is adjustable to
> program needs, just like ordinary stack size is, but a CPU resident
> register stack has a fixed depth. Eventually the register stack still
> could be kept in RAM, with an dedicated cache equivalent to the L1/L2
> caches.

The original idea behind the 8087 register stack was that it would be
interrupt driven, spill on overflow, restore on underflow. The
problem was that the logic wasn't tested before the chip was built,
and then it was too late. Also, it has only 8 entries.

> Then the caches would automatically push/pop register contents
> depending on their actual *use*, not by fixed push/pop *instruction
> sequences*. OTOH we already have nested caches, so that the effect of
> an additional register cache is questionable. (see Wikipedia "CPU
> cache")

A larger register stack, with working automatic spill/restore,
and a fast enough interrupt handler, might work.

> The x86 architecture uses another approach (register renaming), with a
> high number of shadow registers (compared to only 16 addressable
> registers). I'm not sure, though, how a compiler should generate code
> for best use of that model...

I first knew about register renaming for the floating point unit
of the IBM 360/91. It was a popular example machine for many generations
of books on pipelined processors. With only four floating point
registers, renaming was pretty important for out-of-order execution
for S/360. That was especially true since the 360/91 was designed
to be able to run code not specifically optimized for it.

RISC processors tend to expect compilers to order instructions
appropriately, and also usually have plenty of registers.

> [There were stacks with the top few registers kept in fast memory in
> the Burroughs machines in the 1960s. It was easy to generate code for
> them, but since register coloring was invented in the 1970s, modern
> code scheduling for normal registers is much more effective. -John]

Hmm. Not that I completely understand it, but it seems to me that
normal general registers are generally used over shorter distances.
Larger ones, such as vector registers, take longer to save and restore,
and might also need to store values for a longer time. It could help
much not to have to save/restore for every interrupt, for example.

-- glen

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.