Re: Stack based-Register Based

"Joachim Durchholz" <>
15 Feb 2001 00:39:07 -0500

          From comp.compilers

Related articles
[6 earlier articles]
Re: Stack based-Register Based (2001-02-01)
Re: Stack based-Register Based (Jonathan Barker) (2001-02-01)
Re: Stack based-Register Based (2001-02-04)
Re: Stack based-Register Based (2001-02-04)
Re: Stack based-Register Based (2001-02-04)
Re: Stack based-Register Based (2001-02-04)
Re: Stack based-Register Based (Joachim Durchholz) (2001-02-15)
Re: Stack based-Register Based (Joachim Durchholz) (2001-02-25)
Re: Stack based-Register Based (Joachim Durchholz) (2001-02-25)
Re: Stack based-Register Based (2001-02-25)
Re: Stack based-Register Based (Niall Dalton) (2001-03-01)
| List of all articles for this month |

From: "Joachim Durchholz" <>
Newsgroups: comp.compilers
Date: 15 Feb 2001 00:39:07 -0500
Organization: Compilers Central
References: 01-01-124 01-01-137 01-01-162
Keywords: architecture, interpreter
Posted-Date: 15 Feb 2001 00:39:07 EST

Andrew Berg <> wrote:
> What kind of expression would make 2^32 registers be "too few"?
> (That's an expression I would hate to meet in a dark alley.)

It's not that unlikely - code generators tend to reach such "who in
his right mind would overstep this limit" limitations with
ease. Though huge switch statements with thousands of cases seem to be
more prominent, at least when I look at how often this topic recurs in
this newsgroup.

> The use of a register
> file in memory should not be appreciably slower than the use of a
> stack in memory.

Actually it should be *faster*, else there would be no point in having
a register file in the beginning. For example, the addressing modes
for accessing the register files could be shorter than those for
general top-of-stack-relative adressing, so the code would be shorter
and require less memory bandwidth.

> > What is, in your opinion, the difference between values in registers
> > and values in a stack?
> The biggest difference IMO is that values in registers makes it clear
> what are being used for what (at least to a compiler or hardware)
> whereas values in a stack are just all jumbled together.

For the compiler, there should be no real difference. It knows exactly
what value is at which offset; if the expression stack is emptied
between basic blocks, this is dead simple. Things get more
complicated if optimization mandates that stack entries should survive
basic block boundaries. (Actually it's the same for register
allocation decisions, so there is no real difference again.)

> I wonder if any study has been done of a compiler which analizes the
> called routine's register usage and only saves those registers which
> both are overwritten by the called routine, and contain important
> values. This might be hard to get right since to do it right you'd
> have to do a transitive closure of called routines, and they would
> likely not all reside in the same compilation unit.

There's a simpler solution: have the called routine save and restore
registers. (This was standard on Atari STs, years ago. I'm pretty sure
it's standard even today, though I don't see enough solutions to tell
whether one is "standard" or not.) This solution has the added bonus
that it will work even in the presence of a dispatch table or in other
situations when it's undecidable which function will be called.

> > There exist(ed) processors, which implement their registers in a
> > stack, and moved the base pointer with every scope change. Many
> > registers, but all in memory!

I once saw the description of such a machine. That machnine was also
reported to be dead slow, because memory access was *much* slower than
register access would have been. (It was some TI processor decades

IIRC the Sparc architecture solved this problem by simply caching the
topmost 128 (or so) registers on-chip. Other architectures (Pentium
and probably every other modern processor) generally cache the last
data accessed, be it stack or heap or whatever. With such
architectures, it's important to have addressing modes that result in
short instruction sequences for common addresses (with the "the stack
is the registers" approach, this means short stack-relative
instructions for stuff that's near top-of-stack).

> IOW: I claim that if our VM has 256 registers, our ISA has 32
> registers and our expression has 300 concurrent temporary values, the
> high level language can schedule the block down to 256 registers by
> adding stores and loads as appropriate. These stores and loads should
> not hinder the JIT or load-time compiler from also being able to
> generate an optimized register scheduling. In fact, the stores and
> loads added in should be ones that would have been added in anyhow if
> the compiler were optimizing directly from the 300 values to the 32
> values in a single step.

I've been told that register optimization is slow, which would make it
inappropriate for JIT compilation. A stack VM plus a "we compile the
thing down to machine code once, when it hits the disk" approach
should be workable though. In such a setting, it might be even more
advantageous to use a format that doesn't have state at all
(e.g. expression trees). Tree VMs are more common for functional
languages, but they have been used with interesting results on
imperative ones as well (e.g. slim binaries).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.