|EQNTOTT Vectors of 16 bit Numbers [Was: Re: Yikes!!! New 200Mhz Intel email@example.com (1995-11-09)|
|Re: EQNTOTT Vectors of 16 bit Numbers [Was: Re: Yikes!!! New 200Mhz In firstname.lastname@example.org (1995-11-14)|
|Re: EQNTOTT Vectors of 16 bit Numbers email@example.com (1995-11-17)|
|Re: EQNTOTT Vectors of 16 bit Numbers firstname.lastname@example.org (1995-11-19)|
|Re: EQNTOTT Vectors of 16 bit Numbers email@example.com (1995-11-20)|
|Re: EQNTOTT Vectors of 16 bit Numbers firstname.lastname@example.org (1995-11-20)|
|Re: EQNTOTT Vectors of 16 bit Numbers email@example.com (1995-11-21)|
|From:||firstname.lastname@example.org (Robert Bernecky)|
|Keywords:||benchmarks, optimize, APL|
|Organization:||University of Toronto, Computer Engineering|
|References:||95-11-079 95-11-132 95-11-164|
|Date:||Tue, 21 Nov 1995 15:25:55 GMT|
email@example.com (Henry Baker) writes:
> firstname.lastname@example.org (David Keppel) wrote:
>> >["Intel's special SPEC optimization."]
>Actually, IBM's APL implementation was extremely well done, because
>it optimized the code that got executed the most often. The thing
Well, actually, no APL did a very good job of "optimizing the
code that got executed the most often", namely -- storage management,
syntax analysis, conformability and type checking etc. We tried,
but there is still a lot to be said for compiled code.
What almost EVERY APL implementation did and did well was to
optmize array operations to a degree far beyond the ability of
most programmers. This is why the next paragraph holds.
>that embarrassed the Fortran'ers of the world was that fact that a
>hand-optimized loop that dominates a computation can beat the pants
>off an optimizing compiler every day of the week. For
>not-terribly-large arrays, the APL interpreter itself takes very
>little of the overall speed. The only time APL bogs down is when you
>don't take advantage of the built-in array operations.
>There was a problem with memory usage on non-compiled APL implementations,
>but that is a different story entirely. Also, the APL approach wouldn't
>work so well on modern machines, because modern machines are much more
>sensitive to memory usage & locality.
I am not familiar with ANY successful COMMERCIAL compiled APL,
[and only familiar with a few non-commercial compiled APL systems,
including my own]. However, APL systems tend to perform
excellently on cache machines, getting hit rates consistently above
the "standard mix" for a given machine, purely because of array
operations that tend to be implemented as stride-1 operations.
The place where APL interpreters have traditionally fallen down is
in their naivity -- they fail [for good reasons] to perform
loop fusion, CSE, etc. The ONLY APL interpreter I know of that
does this at all is IBM's APL2 for the 3090 Vector Facility,
where the VF does a good job of hosting array operations that would
otherwise leave you register-starved.
Return to the
Search the comp.compilers archives again.