28 Jul 1996 10:53:59 -0400

Related articles |
---|

Re: High Precision Arithmetic Benchmarks glew@ichips.intel.com (1996-07-28) |

Re: High Precision Arithmetic Benchmarks hbaker@netcom.com (1996-07-31) |

Re: High Precision Arithmetic Benchmarks Terje.Mathisen@hda.hydro.com (Terje Mathisen) (1996-07-31) |

Re: High Precision Arithmetic Benchmarks jgk@jgk.org (Joe Keane) (1996-08-04) |

From: | glew@ichips.intel.com (Andy Glew) |

Newsgroups: | comp.arch,comp.benchmarks,comp.compilers |

Date: | 28 Jul 1996 10:53:59 -0400 |

Organization: | Intel Corporation |

References: | <4svpfb$ifq@news-rocq.inria.fr> <4t0e7e$n1c@onramp.arc.nasa.gov> <w05hgqzfnsi.fsf_-_@pdx007.intel.com> <4t3cfe$kcd@qualcomm.com> <DOCONNOR.Jul2330104@sedona.intel.com> |

Keywords: | arithmetic, benchmarks, optimize |

]dscofea@qualcomm.com (Doug Scofea) writes:

] Unfortunately, the SPEC suite seems to require portabillity. If there

] is ever one thing that ought to be non-portable it should be multiple

] precision arithmetic. Since programming languages have no support for

] things like add-with-carry, subtract-with-borrow, or

] 32-bit-unsigned-times-32-bit-unsigned-yields-64-bit-product, all the

] portable benchmarks do not show what the chip is capable of doing.

*>doconnor@sedona.intel.com:*

*>Programming languages can be extended by the addition of a*

*>standardized library interface definition, in this case for*

*>multi-precision arithmatic. Which would be a good thing.*

There aren't all that many ways of writing extended precision add or

multiply. Certainly, it is not at all unreasonable to expect a

compiler to recognize an extended precision addition coded in the

naive 8 bit or 16 bit algorithm - which is highly portable - and

optimize it to be performed 64 or more bits at a time.

Certainly, bignum add looks easy. The key is using arrays, not

pointers, and declarations that are relatively easy to disambiguate.

temp32 := 0

FOR i FROM 0 TO 4n-1 DO

temp32 := a16[i] + b16[i] + temp32 >> 16

c16[i] := temp32 & 0x0FFFF

to

temp64 := 0

FOR i FROM 0 TO 4n-1 BY 2 DO

temp64 := ((int32*)a16)[i] + ((int32*)b16)[i] + temp64 >> 32

((int32*)c16)[i] := temp64 & 0x0FFFFFFFF

(which, by the way, would be a good optimization to do even now -

since Intel *does* support 32*32->63 bit multiplies, which is half the

battle.)

The more advanced extended precision algorithms, Karatsuba or Coomba,

are a bit harder to scale from small to large integers. But then,

really aggressive multiprecision algorithms use modulo rings, and

therefore do not necessarily use large integers.

Furthermore, smart multiprecision arithmetic coding will also tend to

use redundant arithmetic to increase parallelism and avoid carry

propagation - especially as things like MMX make handling of small

integer vectors more efficient. And DEC has demonstrated that

vectorizaation of such small integer codes is possible.

I therefore do not think it at all unreasonable that compilers should

be able to support extended precision arithmetic coded in a high level

language like C or C++, even if the algorithm is expressed in terms of

smaller-than-optimal integer sizes, like 8 or 16 bits, for

portability. Conversion to use larger integers like 64 bits, or

conversion to use small integer vectors, should be within present

compiler technology.

(I.e. if I teach an optimizer class in the next few years, I may

assign some parts of this as a class project - given a good vectorizer

to start with.)

I therefore think it *DESIRABLE* that any putative encryption

benchmark contain efficient C-coded algorithms. Adaptation of the

benchmark rules for assembly coded support should be optional,

and, indeed, discouraged.

--

---

Andy "Krazy" Glew, glew@ichips.intel.com, Intel,

M/S JF3-206, 2111 NE 25th Ave., Hillsboro, OR 97124

Place URGENT in email subject line for mail filter prioritization.

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.