Re: High Precision Arithmetic Benchmarks (Andy Glew)
28 Jul 1996 10:53:59 -0400

          From comp.compilers

Related articles
Re: High Precision Arithmetic Benchmarks (1996-07-28)
Re: High Precision Arithmetic Benchmarks (1996-07-31)
Re: High Precision Arithmetic Benchmarks (Terje Mathisen) (1996-07-31)
Re: High Precision Arithmetic Benchmarks (Joe Keane) (1996-08-04)
| List of all articles for this month |

From: (Andy Glew)
Newsgroups: comp.arch,comp.benchmarks,comp.compilers
Date: 28 Jul 1996 10:53:59 -0400
Organization: Intel Corporation
References: <4svpfb$> <4t0e7e$> <> <4t3cfe$> <>
Keywords: arithmetic, benchmarks, optimize

] (Doug Scofea) writes:
] Unfortunately, the SPEC suite seems to require portabillity. If there
] is ever one thing that ought to be non-portable it should be multiple
] precision arithmetic. Since programming languages have no support for
] things like add-with-carry, subtract-with-borrow, or
] 32-bit-unsigned-times-32-bit-unsigned-yields-64-bit-product, all the
] portable benchmarks do not show what the chip is capable of doing.

>Programming languages can be extended by the addition of a
>standardized library interface definition, in this case for
>multi-precision arithmatic. Which would be a good thing.

There aren't all that many ways of writing extended precision add or
multiply. Certainly, it is not at all unreasonable to expect a
compiler to recognize an extended precision addition coded in the
naive 8 bit or 16 bit algorithm - which is highly portable - and
optimize it to be performed 64 or more bits at a time.

Certainly, bignum add looks easy. The key is using arrays, not
pointers, and declarations that are relatively easy to disambiguate.

        temp32 := 0
        FOR i FROM 0 TO 4n-1 DO
temp32 := a16[i] + b16[i] + temp32 >> 16
c16[i] := temp32 & 0x0FFFF


        temp64 := 0
        FOR i FROM 0 TO 4n-1 BY 2 DO
temp64 := ((int32*)a16)[i] + ((int32*)b16)[i] + temp64 >> 32
((int32*)c16)[i] := temp64 & 0x0FFFFFFFF

(which, by the way, would be a good optimization to do even now -
since Intel *does* support 32*32->63 bit multiplies, which is half the

The more advanced extended precision algorithms, Karatsuba or Coomba,
are a bit harder to scale from small to large integers. But then,
really aggressive multiprecision algorithms use modulo rings, and
therefore do not necessarily use large integers.

Furthermore, smart multiprecision arithmetic coding will also tend to
use redundant arithmetic to increase parallelism and avoid carry
propagation - especially as things like MMX make handling of small
integer vectors more efficient. And DEC has demonstrated that
vectorizaation of such small integer codes is possible.

I therefore do not think it at all unreasonable that compilers should
be able to support extended precision arithmetic coded in a high level
language like C or C++, even if the algorithm is expressed in terms of
smaller-than-optimal integer sizes, like 8 or 16 bits, for
portability. Conversion to use larger integers like 64 bits, or
conversion to use small integer vectors, should be within present
compiler technology.
        (I.e. if I teach an optimizer class in the next few years, I may
assign some parts of this as a class project - given a good vectorizer
to start with.)

I therefore think it *DESIRABLE* that any putative encryption
benchmark contain efficient C-coded algorithms. Adaptation of the
benchmark rules for assembly coded support should be optional,
and, indeed, discouraged.

Andy "Krazy" Glew,, Intel,
M/S JF3-206, 2111 NE 25th Ave., Hillsboro, OR 97124
Place URGENT in email subject line for mail filter prioritization.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.