Re: Register Allocation and Aliasing (really: zillions of transistors)

mash@mips.COM (John Mashey)
Sat, 14 Jul 90 22:39:48 GMT

          From comp.compilers

Related articles
Re: Register Allocation and Aliasing (really: zillions of transistors) mash@mips.COM (1990-07-14)
| List of all articles for this month |

Newsgroups: comp.arch,comp.compilers
From: mash@mips.COM (John Mashey)
Followup-To: comp.arch
Keywords: optimize
Organization: MIPS Computer Systems, Inc.
References: <>
Date: Sat, 14 Jul 90 22:39:48 GMT

In article <> (Ron Guilmette) writes:

>> Hare brained idea: allocate quantities that *might* be aliased to
>>registers anyway. Provide a register to contain the true memory
>>address of the aliased quantity, which causes a trap when the address
>>is accessed (or automagically forwards to/from the register). Not
>>only are aliasing problems avoided, but you've got a set of data
>>address breakpoint registers as well! (ie. this technique could be
>>experimentally evaluated on machines that have data address

Some of this sounds interesting, and some may be useful in the future,
for various applications. However, one must be careful, especially
in a world of LIW, super-scalar, super-pipelined, and super-scalar-super-
pipelined multiple-issue machines (i.e., all RISCs that expect to be
competitive in the next few years), that you don't stick something
in a critical path that blows your cycle time by 50%....

Maybe this is a good time to expound a little on a related widespread fantasy
chat might be called:
When You Have A Zillion Transistors On A Chip, All Of Your
Problems Go Away.
Most of the following is over-simplified discussion of EE stuff from a
software guy's viewpoint; maybe some real VLSI types will corect goofs and
expound more on this topic:

It is clear that more space on a die help a lot, and they let
you do things like:
bigger on-chip caches
a wonderful thing: regular, dense, and
transistors, rather than wires
this includes: I-caches, D-caches, TLBs, branch-target
buffers, pre-decoded instruction buffers, etc.
monster-fast FP units and other arithmetic units
for some kinds of units (like multipliers),
more space ==> faster, reduces latency of operation,
always a good thing.
more copies of functional units, or more pipelining
increases the repeat rate for an operation, which may help
some kinds of things.
wider busses, increasing intra-chip bandwidth.
On the other hand, there are some nasty facts of life (for CMOS, anyway):
1) FAN-OUT is not free. Put another way, the more loads on a bus,
the slower it is. Bigger transistors help, up to a point, but
what usually happens is that you must cascade the gates to keep
the total delay minimized.
2) FAN-IN is not free either.
3) WIRES DON'T SHRINK as fast as transistors (because the resistance
increases as they get narrower). Hence, as you do shrinks to
increase the speed, and get more on a chip, this means the wires
can gobble up more of the space.
Put another way:
1) The more things listening to you, the slower you are.
2) The more things you listen to, the slower you are.
3) Don't think you can run monster busses all over the place for free.
All of this says that people STILL have to think very hard about
delays in the critical paths in a CPU. The faster you go, the more you're
likely to be doing more things in parallel, but if you're not careful,
these factors can bite you badly, especially in a single-chip
design that can only dissipate so much heat.
-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: OR {ames,decwrl,prls,pyramid}!mips!mash
DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.