|Cray-2 Fast Memory firstname.lastname@example.org (1993-05-13)|
|Re: Cray-2 Fast Memory email@example.com (1993-05-14)|
|Re: Cray-2 Fast Memory firstname.lastname@example.org (1993-05-26)|
|Re: Cray-2 Fast Memory email@example.com (1993-05-26)|
|Re: Cray-2 Fast Memory firstname.lastname@example.org (1993-05-27)|
|Re: Cray-2 Fast Memory email@example.com (1993-05-31)|
|From:||firstname.lastname@example.org (James Davies)|
|Keywords:||registers, optimize, Cray|
|Organization:||Cray Computer Corporation|
|Date:||Wed, 26 May 1993 20:48:30 GMT|
Patrick Delano <email@example.com> writes:
> Apparently the Cray-2 had a fast memory that unlike cache memory was
> explicitly managed by the compiler.
This is also true of the Cray-3; each has 16K words of local memory per
processor. This memory is a bit less flexible than common memory, in
that vector loads and stores must be stride-1.
firstname.lastname@example.org (David desJardins) writes:
>Basically, no software techniques were used. The compiler does very
>little to take advantage of the local memory. As far as I am aware, the
>only ways in which it is used are the following:
> o Temporary storage for register spillage.
> o As a means of extracting scalar values from vector registers
> (which can be done directly on the X-MP and Y-MP).
> o When the programmer, by directive, explicitly indicates that a
> variable is to be placed in local rather than common memory.
All true, but it's also used for subroutine linkage information, such as
return addresses and stack pointers (the stack itself is in common
memory). Each routine is allocated a static area for this purpose, which
must be saved and restored for potentially recursive calls.
Even with this limited usage, local memory tends to be in short supply, as
there are only 16K words available per processor. The linker attempts to
overlay local-memory blocks when possible, but there is still a need for a
compiler option to minimize local memory usage by e.g. spilling registers
to common memory.
>I believe that the primary reason that more sophisticated techniques
>were not used by the compiler is that less than 40 Cray-2 machines were
>manufactured and sold, compared to hundreds of X-MP and Y-MP type
Partly, but the compilers for the X's and Y's don't do any of the loop
transformations necessary to do memory management either. Basically they
do inner-loop vectorization and leave the fancy multi-loop optimizations
to a preprocessor. The incentive to use local memory (or to license some
third-party product like KAP to use it) would be greater if there were
more available, but it's hard to justify when you're already squeezed for
Return to the
Search the comp.compilers archives again.