Re: Green Compiler ?

Hans-Peter Diettrich <>
Wed, 02 Jan 2013 07:29:16 +0100

          From comp.compilers

Related articles
[14 earlier articles]
Re: Green Compiler ? (Peter Dassow) (2012-12-29)
Re: Green Compiler ? (Hans-Peter Diettrich) (2012-12-30)
Re: Green Compiler ? (Dmitry A. Kazakov) (2012-12-30)
Re: Green Compiler ? (George Neuner) (2012-12-31)
Re: Green Compiler ? (Jonathan Thornburg) (2013-01-02)
Re: Green Compiler ? (glen herrmannsfeldt) (2013-01-02)
Re: Green Compiler ? (Hans-Peter Diettrich) (2013-01-02)
Re: Green Compiler ? (glen herrmannsfeldt) (2013-01-02)
Re: Green Compiler ? (glen herrmannsfeldt) (2013-01-02)
Re: Green Compiler ? (Charles Richmond) (2013-01-04)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: Wed, 02 Jan 2013 07:29:16 +0100
Organization: Compilers Central
References: 12-12-010 12-12-012 12-12-022 12-12-028 12-12-034 12-12-037 13-01-002
Keywords: architecture, performance, comment
Posted-Date: 02 Jan 2013 13:20:05 EST

George Neuner schrieb:
> On Sun, 30 Dec 2012 08:14:26 +0100, Hans-Peter Diettrich
> <> wrote:
>> Peter Dassow schrieb:
>>> On 28.12.2012 08:35, Hans-Peter Diettrich wrote:
>>>> Please note that most CMOS processor power consumption results from
>>>> switching (stray) capacities, and only a small percentage for leak
>>>> currents. E.g. a register or gate consumes such power whenever a bit is
>>>> changed, and almost nothing when it has reached an stable state.
> Smaller transistors have more leakage.

ACK (tunneling effect).

>>> So using extensively registers instead of "conventional" memory (e.g.
>>> DDR-RAM, memory outside a CPU) will save energy (if equal functionality
>>> is given) ?
>> I don't see a relationship here, except that external memory is slow
>> and [in x86] a couple of caches and address translations are involved
>> in reading from RAM.
> But associative cache and external memory accesses both are power
> intensive.

Right, but since every reference to external memory costs *time* in
the first place, *every* compiler already optimizes for best register
usage. There is nothing that can be done *additionally* in a "green"
compiler. C already has a "register" keyword, as a compiler hint to
hold a local variable in an register.

The actual use of registers depends on the control flow taken
*actually*. When a subroutine is optimized for using all available
registers, it has to save and restore the registers on entry/exit.
When it actually does nothing, due to given conditions, the time and
energy used for pushing/popping the registers is only wasted.

For that reason some (Texas Instruments?) processors implemented a
register stack, decades ago, with its stack pointer adjusted according
to the number of registers used in a subroutine. This stack could be
moved into the CPU nowadays, eliminating the need for saving registers
in external memory. But this optimization reaches a hard limit on
deeply nested calls, with every subroutine using a high number of
registers. In external memory the register-stack size is adjustable to
program needs, just like ordinary stack size is, but a CPU resident
register stack has a fixed depth. Eventually the register stack still
could be kept in RAM, with an dedicated cache equivalent to the L1/L2
caches. Then the caches would automatically push/pop register contents
depending on their actual *use*, not by fixed push/pop *instruction
sequences*. OTOH we already have nested caches, so that the effect of
an additional register cache is questionable. (see Wikipedia "CPU

The x86 architecture uses another approach (register renaming), with a
high number of shadow registers (compared to only 16 addressable
registers). I'm not sure, though, how a compiler should generate code
for best use of that model...

>> But registers are a very scarce resource, so that frequent loading from
>> memory is hardly avoidable.
> My opinions are colored by experience with DSPs, but I have long
> thought that it would be helpful to have a few K-words of non-cache
> scratchpad memory very close (1..2 cycles) to the CPU.

ACK, but see above considerations on the use of such memory, with its
*size* limited by the architecture, and *usage* depending on subroutine
needs and control flow (subroutine nesting, branches taken...).

[There were stacks with the top few registers kept in fast memory in
the Burroughs machines in the 1960s. It was easy to generate code for
them, but since register coloring was invented in the 1970s, modern
code scheduling for normal registers is much more effective. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.