19 Oct 1998 01:26:15 -0400

Related articles |
---|

Re: inlining + optimization = nuisance bugs luddy@concmp.com (Luddy Harrison) (1998-09-29) |

Re: floating point, was inlining + optimization = nuisance bugs chase@world.std.com (David Chase) (1998-10-04) |

Re: floating point will@ccs.neu.edu (William D Clinger) (1998-10-05) |

Re: floating point comments@cygnus-software.com (Bruce Dawson) (1998-10-07) |

Re: floating point will@ccs.neu.edu (William D Clinger) (1998-10-10) |

Re: floating point dmcq@fano.demon.co.uk (David McQuillan) (1998-10-13) |

Re: floating point darcy@CS.Berkeley.EDU (Joseph D. Darcy) (1998-10-19) |

Re: floating point darcy@usul.CS.Berkeley.EDU (1998-10-24) |

Re: floating point comments@cygnus-software.com (Bruce Dawson) (1998-11-01) |

Re: floating point comments@cygnus-software.com (Bruce Dawson) (1998-11-01) |

Re: floating point darcy@usul.CS.Berkeley.EDU (1998-11-06) |

Re: floating point darcy@CS.Berkeley.EDU (Joseph D. Darcy) (1998-11-06) |

Re: floating point comments@cygnus-software.com (Bruce Dawson) (1998-11-07) |

[4 later articles] |

From: | "Joseph D. Darcy" <darcy@CS.Berkeley.EDU> |

Newsgroups: | comp.compilers |

Date: | 19 Oct 1998 01:26:15 -0400 |

Organization: | Compilers Central |

References: | 98-09-164 98-10-018 98-10-040 |

Keywords: | arithmetic, comment |

William D Clinger wrote:

...

*>> The basic problem is that the IEEE standard was conceived as a*

*>> standard for hardware, and says scarcely a word about high level*

*>> languages or compilers.*

Providing language bindings for IEEE 754 features was recognized as an

issue during the standardization process. Members of the standards

committee felt that developing a language binding would further delay

the standardization process and perhaps jeopardize adoption of the

standard.

However, even before IEEE 754 was finally adopted, there were both

published papers on IEEE 754/language interaction [Fat, Fel] and an

implementation of a language binding, namely SANE [App]. SANE

(Standard Apple Numerics Environment) basically provides access to all

IEEE 754 features, rounding modes, sticky flags, and the double

extended format. The SANE specification concentrates on a Pascal

binding, but C and Fortran are also discussed.

Therefore, if previous language designers and compiler writers wanted

to provide better IEEE 754 support, there was prior art to reference

and enhance.

*>> IEEE Std 754-1985 goes to great lengths to*

*>> ensure that IEEE floating point arithmetic is predictable at the*

*>> hardware level, but Kahan himself has urged compiler writers to use*

*>> extended precision for intermediate results, without seeming to*

*>> appreciate how this leads to unpredictable floating point arithmetic*

*>> at the language level, where almost all programmers live.*

Using extended precision for intermediate results does *not*

necessarily lead to unpredictable programs. In fact, Kahan decried

the Sun III compilers whose arbitrary use of extended precision made

programs unpredictable. He has also argued strongly against Sun's

recent "Proposal for Extension of Java(TM) Floating Point Semantics"

(PEJFPS) which would introduce the same sort of floating point

anomalies into Java.

When using extended precision, it is crucial to have a language type

which corresponds to the extended format. This is necessary to be

able to preserve referential transparency. For example, if extended

precision is used for expression evaluation and there is no way for

the programmer to store an extended value, breaking a long expression

into pieces can change the computed result.

The sole use of extended precision is not to make programs run faster

on certain architectures; extended precision can help programmers

implement better algorithms [Kah1] and help protect programmers from

unknown numerical instabilities [Kah2]. If compilers and languages

use reasonable rules, extended precision computations can be

predictable, see the report from the Java Grande numerics working

group for one proposal [JaG].

As mentioned previously in this thread, in his comments on [Gol], Doug

Priest discusses various reasonable floating point expression

evaluation policies that languages should provide [Pri]. The polices

have different benefits and certain policies are more appropriate (or

necessary) in some circumstances than others. When designing a

language (or if given enough leeway by the language, when writing a

compiler) the question arises as to which of five or so policies is

best for the programmers using the language. Kahan's position is that

everyday programmers benefit from having their floating point

expressions evaluated in the widest format having hardware support,

double extended on the x86, double most everywhere else. Kahan feels

this way because the extra precision of double extended protects

programmers unknowingly using numerical unstable formulas. See [Kah1]

for an extended discussion of some numerical problems with Heron's

formula for calculating the area of a certain triangles.

Bruce Dawson said:

*>Kahan does like to push the idea of using extended precision for*

*>temporaries which, as you say, ignores the problem of how to specify*

*>what is a temporary.*

Specifying what a temporary or anonymous value is isn't that

difficult. If the value doesn't have name, it is anonymous. All

explicit stores must be respected in both precision and range. For

example, in

a = b*c + d;

b*c could be calculated to extended precision, as could (b*c)+d.

However, when that quantity is assigned to a, the value must be

rounded accordingly. This style of floating point evaluation was used

in pre-ANSI C.

*>However where Kahan's creation (the 80x87 and its successors) really*

*>fall down is that they make pure double precision impossible _and_*

*>they make pure long double precision extremely difficult.*

*>Writing pure double precision for the 80x87 is impossible because the*

*>round_to_double flag doesn't quite work.*

Setting the rounding precision to double and having extended exponent

range is explicitly allowed by IEEE 754. (However, if such values

need to be spilled to memory, to preserve the extended exponent range

they must be spilled as 80 bit values).

Until recently, known techniques to make the x86 round *exactly* to

"pure double" entailed about a 10X performance hit on the x86.

Roger Golliver of Intel has developed an elegant refinement of

existing practice that gives exactly pure double rounding on the x86

for about a 2X to 3X speed penalty. A 2X to 3X slowdown is

approximately the same performance as the approximations to pure

double used by current Java JITs on the x86. [JaG] describes

Golliver's technique in more detail.

*> It can introduce double rounding, and it doesn't clamp the*

*> exponent. The exponent clamping can be forced by writing to memory,*

*> but the double rounding is inevitable (but rare?)*

The double rounding can only occur for subnormal values, which are

very rare in practice. In the context of Java, while x86

implementations may exhibit double rounding on underflow, this is a

trivial exact reproducibility issue compared to faulty decimal <-->

binary conversion (as found in Sun's JDK 1.0) and non-conformant

transcendental functions (present in at least early iterations of JDK

1.1.x). As discussed by Doug Priest, such double rounding on

underflow "is highly unlikely to affect any practical program

adversely" [Pri].

The 680x0 restricts the exponent and the significand when the rounding

precision is set to float or double. In retrospect, this is a better

design decision than allowing the extended exponent range as on the

x86. However, the x86 design had good intentions. The extended

exponent range reduces the occurrence of overflow and underflow

exceptions.

*>Simultaneously, the 80x87 makes it extremely difficult to write pure*

*>extended precision math. Because most of the FPU instructions that*

*>reference memory only support float or double precision, compiler*

*>vendors have to write an entirely different code generator if they*

*>want to support extended precision.*

The x86's floating point load instruction can load float, double, or

double extended values (all floating point values are converted to

double extended when brought into the floating point register stack).

By setting the rounding precision, the same arithmetic instructions

act on all three formats (with extended exponent range). The FST

instruction can only store float or double values. The FSTP

instruction can store and pop float, double, or double extended. This

last difference in store instructions is certainly an annoying, but

not an insurmountable, problem to code generation.

As long as the language has a type corresponding to double extended,

it is not hard to write code that uses double extended. Writing out

80 bit values to memory is somewhat slower than writing out 64 bit

values (3 cycles versus 2 on recent x86 processors).

*> Presumably that is why VisualC++*

*>dropped support for long double some time ago.*

My understanding is that MS VC++ dropped support for 80 bit long

double to limit differences between NT on x86 and NT on Alpha since

the only IEEE formats the Alpha supports are float and double.

*>So what are we left with? Who is happy?*

*>1) The speed demons are moderately happy, because the latest*

*>incarnations of the 80x87 are fairly fast. But they're not ecstatic,*

*>because the bizarre tiny-stack architecture makes fast code beastly*

*>complicated to write and debug, and still isn't as fast as it could*

*>be.*

To be fair, the design constraints from about twenty years ago are

very different from the design constraints today. However, there are

some unintended problems with how the x86 floating point stack is

implemented; it is very difficult to discriminate between stack

over/underflow and "invalid" floating point operations. This design

oversight makes generating fast floating point code unnecessarily

hard. Internally, the recent x86 chips have many more registers than

visible to the programmer.

*>2) Those who want predictable double precision results aren't happy*

*>because the results they need are impossible to get all the time.*

What do you mean by predictable? There are degrees of predictability;

predictable on the same machine with the same compiler, on the same

machine with different compilers, all the way to a different

architecture and a different compiler. Java promises, but does not

deliver, cross-architecture exact reproducibility. Sun's PEJFPS would

remove predictability even on the same machine with the same compiler

and the same input data.

Predictability is not equivalent to always getting the same answer

everywhere.

*>Although, with rounding set to double precision they probably do get*

*>them 99.999% of the time - any other guesses?*

These difference arise due to double rounding on underflow. Besides a

small numerical difference in the last bit (around 10^-324), a more

general concern is does a program behave sensibly when underflow

occurs, does it compute an accurate answer. Underflow and overflow

can violate the assumptions of numerical programs, invalidating their

results. Therefore, detecting and handling such events is necessary

for robust programs. In the context of Java, the language's refusal

to grant access to the IEEE sticky flags, features of IEEE 754

designed to allow detection of such events, unnecessarily complicates

the development of robust numerical libraries.

*>In short, if you want to design an FPU that has an ultra-fast or*

*>ultra-precise mode you have to make sure it can be turned off*

*>completely, for those who want predictability.*

As Motorola did for the 68000, Intel could provide a rounding

precision mode where the exponent was restricted as well. Perhaps

Merced has such a mode. This kind of mode would eliminate much of the

complexity of strictly implementing Java floating point on the x86.

*> And, you have to make*

*>sure that turning it off is trivial - forcing compiler writers to do*

*>anything more than set a bit is unacceptable - they won't do it.*

Running pure float and pure double code on the x86 certainly could be

easier. However, I don't see why compiler writers should expect to

get off the hook just because floating point code generation may be

more subtle than they would like. Compiler writers may not want to

think about floating point but that doesn't mean they shouldn't think

about floating point. Compiler backends are responsible for a

significance fraction of the performance benefit of recent processors.

If Merced catches on, this trend will not abate anytime soon.

Compilers are expected to generate and schedule code that runs well

and is correct with respect to language and processor semantics.

Floating point should be no exception. Processors shouldn't have

perverse floating point, but some adversity does not license compiler

laxity.

Too often compilers are overly concerned with "optimizing" floating

point expressions. Transformations are used that make the program run

faster but disregard the underlying floating point semantics and

sometimes the intentions of the programmer.

-Joe Darcy

darcy@cs.berkeley.edu

References

==========

[App] Apple Numerics Manual, Second Edition, Apple, Addison-Wesley

Publishing Company, Inc., 1988.

[Fat] Richard J. Fateman, "High-Level Language Implications for the Proposed

IEEE Floating-Point Standard," ACM Transactions on Programming

Languages and Systems, vol. 4, no. 2, April 1982, pp. 239-257.

[Fel] Stuart Feldman, "Language Support for Floating Point," IFIP TC2

Working Conference on the Relationship between Numerical Computation

and Programming Languages, J.K. Reid ed., 1921, pp. 263-273.

[Gol] David Goldberg, "What Every Computer Scientist Should Know About

Floating-Point Arithmetic," Computing Surveys, vol. 23, no. 1, March

1991, pp. 5-24, also available online from

http://www.validgh.com/goldberg/paper.ps

[JaG] Numerics Working Group, Java Grande Forum, "Improving Java for

Numerical Computation," http://www.javagrande.org sometime soon

[Kah1] W. Kahan, "Miscalculating Area and Angles of a Needle-like

Triangle," http://www.cs.berkeley.edu/~wkahan/Triangle.ps

[Kah2] W. Kahan, "Roundoff Degrades an Idealized Cantilever,"

http://www.cs.berkeley.edu/~wkahan/Cantilever.ps

[Pri] Douglas Priest, "Differences Among IEEE 754 Implementations",

http://www.validgh.com/goldberg/addendum.html

[The C9X committee is currently wrangling about precision rules, and they're

a stinker to write in a way that is both useful and consistent. In cases

like a = 3. + (b = 1./17.); does the 1/17 in the expression get narrowed

to b's width? -John]

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.