UNCOL, or: dealing with loss of information when compiling.

Toon Moene <toon@moene.indiv.nluug.nl>
21 Jun 1996 17:04:10 -0400

          From comp.compilers

Related articles
Re: Java virtual machine as target language for C/C++ kik@zia.cray.com (1996-05-08)
Re: Java virtual machine as target language for C/C++ dw3u+@andrew.cmu.edu (Daniel C. Wang) (1996-05-27)
Re: Using C as an UNCOL dave@occl-cam.demon.co.uk (Dave Lloyd) (1996-06-13)
UNCOL, or: dealing with loss of information when compiling. toon@moene.indiv.nluug.nl (Toon Moene) (1996-06-21)
Re: UNCOL, or: dealing with loss of information when compiling. preston@tera.com (1996-06-23)
UNCOL, or: dealing with loss of information when compiling. dave@occl-cam.demon.co.uk (Dave Lloyd) (1996-06-26)
| List of all articles for this month |

From: Toon Moene <toon@moene.indiv.nluug.nl>
Newsgroups: comp.compilers
Date: 21 Jun 1996 17:04:10 -0400
Organization: Compilers Central
References: 96-05-061 96-05-163 96-06-054
Keywords: C, UNCOL, performance

I wrote:


> Good Fortran compilers (on systems where the vendor has complete
> control, like Cray) indeed do this: calling BLAS routines when
> pattern recognition determines that they can be substituted.
> Fortran 90 is not a prerequisite for that.


And Dave Lloyd (Dave@occl-cam.demon.co.uk) replied:


> My point is that Fortran 90 allows these to be determined
> without complicated pattern recognition.


Yes, but ...


> It is easy for an F77 programmer to write a set of loops that are
> hard to transform into equivalent vector algebra without
> considerable analysis


OTOH, if I just skim through the code _I_ normally write as a
Fortran (mostly 77) programmer, little of it translates easily into
the array notations supported by the Fortran 90 standard. This
might partly be due to the fact that, in Numerical Weather
Prediction, finite difference discretisations are "the one true
way", and they have all these nasty nearest neighbour dependencies.


> (see Wolfe's book for the lengths that you sometimes have to go to
> for things that are 'obviously equivalent').


I must, to my embarrassment, confess that all of my knowledge about
compilers stems from this one text book that carries the nick-name "Dragon
Book". I bought it 14 years ago, and have never read anything else on the
subject. Do you have a reference for Wolfe's book ? (I recall that it has
some repugnant title like "Supercompilers for Supercomputers", which to me
has a uncomfortably high "Carl Sagan touch" to it).


> Further many pattern recognisers
> have a relatively small set of patterns that can be
> matched leading to missed vectorisation when the loops
> don't quite conform (induction variables updated several
> times within a loop get some optimisers). Whereas F90
> constrains the programmer to write vector expressions
> which are directly implementable via BLAS calls - or (as
> with our compiler) which can have code generated directly
> for them with full knowledge of the semantics (giving
> better heuristics for unrolling, register allocation,
> loop nesting, temporary copies required, cache prefetches,
> etc).


This is typical for the point of view of a compiler writer (sorry,
this is not meant to be as harsh as it sounds). Translating
(physics) knowledge into programs takes several steps:


1. From experimentation to mathematical model. Generalisation.
2. From mathematical model to computational model. Discretisation.
3. From computational model to source code. Coding.
4. From source code to executable. Compilation.


In fact, I am involved in steps 1-3, although mostly 2 and 3. What
you lose in steps 2 and 3 is _at least_ as important as the
information that can be lost in step 4 when you have to deal with a
poor intermediate language (be it a human readable one, like C, or
an intermediate language like gcc's RTL representation).


The solution to this problem is not to force programmers to use
Fortran 90 constructs, but to enable translation of the mathematical
model, right down through step 4. This is most apparent when you
want to generate an alternative model from the mathematical model
and you have to go down to manipulating the source code to have it
done _automatically_, because there is no way to go from
mathematical model to source code automatically.


[ This is actually being done to generate so-called "adjoint
models" in various branches of physics to do sensitivity studies -
see URL below and references therein for further details ]


> But if you transformed F90 to C you must either call the
> BLAS via the C (and there are only a finite number of
> them) or you must generate the loops in C and trust to
> the C back-end to deduce what was explicit. Of course in
> generating the C loops, some of the original semantics
> can be used (such as determining the loop nesting) but
> not all unless your C back-end has some hooks (via pragmas
> or attributes) for the front-end.


If you translate Fortran, at least copy-in, copy-out semantics
could be used in the generated C for (scalar) dummy arguments, which
already catches a lot of the possible alias-related optimisations
for non-vector processors.


> In summary, C is sufficient as an intermediate, but it
> is too poor for high-performance compilers of larger
> languages.


Yep, but still I maintain that it was really hard to beat f2c+gcc
with g77, probably because the use of this trio restricted the
comparison to non-vector systems and the gcc backend doesn't support
the specification of no-alias information. I wrote the first mail
to Craig Burley on this subject (beating f2c) somewhere in November
'94 and it took until April this year - with g77-0.5.18 - before
finally, in most cases, g77 indeed produced better code than f2c +
gcc.


Cheers,


--
Toon Moene (toon@moene.indiv.nluug.nl) URL: http://www.knmi.nl/hirlam
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.