Re: Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>
Thu, 2 May 2019 16:51:41 +0200

From comp.compilers

Related articles
[8 earlier articles]
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-30)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-30)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-01)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-01)
Re: Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-02)
Re: Optimization techniques and undefined behavior martin@gkc.org.uk (Martin Ward) (2019-05-02)
*Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown)* (2019-05-02)**
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-05-02)
Re: Optimization techniques and undefined behavior 847-115-0292@kylheku.com (Kaz Kylheku) (2019-05-02)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-02)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-02)
Re: Optimization techniques and undefined behavior auriocus@gmx.de (Christian Gollwitzer) (2019-05-02)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-03)
[19 later articles]

| List of all articles for this month |

From:	David Brown <david.brown@hesbynett.no>
Newsgroups:	comp.compilers
Date:	Thu, 2 May 2019 16:51:41 +0200
Organization:	A noiseless patient Spider
References:	<72d208c9-169f-155c-5e73-9ca74f78e390@gkc.org.uk> 19-04-021 19-04-023 19-04-037 19-04-039 19-04-042 19-04-044 19-04-047 19-05-004
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="67091"; mail-complaints-to="abuse@iecc.com"
Keywords:	errors, optimize, arithmetic
Posted-Date:	02 May 2019 12:11:02 EDT
Content-Language:	en-GB

On 01/05/2019 14:53, Bart wrote:
> On 30/04/2019 13:46, David Brown wrote:
>> On 29/04/2019 18:10, Christian Gollwitzer wrote:
>
>>> P5
>>> 100 200
>>> 255
>>> ..jdk hlhdhqkd.. here comes the binary data
>>>
>>>
>>> The 100 and 200 are the width and height of the image data, the 255 is
>>> the highest possible value (for 16 bit it would be different).
>>> Obviously, you'd read in the width and height, then multiply them to
>>> compute the memory needed for the data, and - oops - how do you make
>>> sure that no overflow occurs? In the past, there had been security
>>> problems in image libraries with exactly this kind of problem: integer
>>> overflow due to unreasonable image sizes.
>>
>> It is really incredibly simple (at least in this case). Do the
>> multiplications using types that won't overflow. That might be an
>> unsigned type if its range is suitable (not because it has defined
>> overflow behaviour, but use it if it has enough range) or a bigger
>> signed integer type.
>
> That's just kicking the can further down the road.
>

Yes (especially for the use of a larger signed type). That's fine. C
let's you easily kick the can a /long/ way down the road. How often do
you need integers that will overflow a 64-bit type?

> If you have two unknown values A and B, and need to multiply, you won't
> know if the result will overflow.
>

First off, how often do you actually have unknown values to deal with?
Usually you know something at least.

> Using a type that is double the width might help, unless A and B are
> already using the widest type. But if they are int32, and you do the
> calculation like this:
>
>     (int64)A * (int64)B

(You only need the cast on one of these, not both. But it's fine to put
it on both if you prefer. And since we are discussing C, why not use
the C types - int64_t ?)

>
> then suppose you had to work out A*B*C; do you use:
>
>     (int128)*(int64)A * (int64)*B) * (int128)C ?
>
> (Real example: imagebytes := width * height * bytes_per_pixel)
>

bytes_per_pixel is not going to be more than, say, 16 - that's for
32-bit per colour, including alpha channel. No picture format is going
to have more. Width and height will also be limited - the highest
resolution image sensors are in the range of 120 megapixels. If you are
talking about a panorama picture sewn together from 5 billion such
camera pictures, you might have a point - and that's just using 64-bit
types.

Looking at it from another viewpoint, are you really wanting to work
with a picture that takes more than 2 ^ 63 bytes storage? If not, there
is going to be no overflow. It really is rare that 128 bit types are
needed (although it is nice if a language supports them for those rare
occasions. Standard C does not require it, though many PC C compilers
support them).

> It doesn't really scale. And it doesn't help here:
>
>     int32 A, B, C;
>
>     C = (int64)A * (int64)B;
>
> as you will need to check the int64 intermediate result for overflow. It
> all gets very messy.
>

Again, /think/ about what your values are and what the code you are
writing is doing.

(And one thing we can be sure of here - having two's complement wrapping
arithmetic certainly will not help.)

> I think a lot of this can be handled with application code dealing with
> validation; it doesn't really benefit from UB in the language.
>
> In the example posed, you have the additional problem that the input can
> be this:
>
>    P5
>    389000000000000000000000000000 9200000000000000000000000000
>
> with both dimensions exceeding int64. You need to get past that first,
> and that might indicate some error before you get around to doing any
> multiplying.
>
> These are all real, practical problems that are starting to get a long
> way from UB in a language.

These are all imaginary problems that are starting to get a long way
from reality.

>
>> There are plenty of cases where it is difficult to write code that is
>> efficient even on poorer compilers, and correct code so that it works on
>> good compilers. No one claims that programming is always easy. And
>> sometimes the best solution is code that is not portable or correct by
>> the standards, but works well with the implementations you need to use.
>> Most code, after all, is only ever compiled on the one compiler.
>>
>>
>>> The simplest thing would be to reject any width or height > 100,000,
>>> claiming that noone can handle this images, but what about an image of
>>> size 200,000 x 3 ? If C++ would provide an easy way to detect / branch
>>> on overflow, for example throw an exception, then this could be handled
>>> easily. I can't see how you can claim that your code never overflows,
>>> unless you don't handle untrusted user input data.
>>>
>
>> All C++ compilers with any self-respect support 64-bit integer types.
>> Do you think it's reasonable to reject any image dimension greater than
>> 2,000,000,000 ? I do.
>
> I remember being shown a typesetting machine some decades ago, which
> (IIRC) was fed PostScript code and produced an image on film at some
> 16,000 dpi.
>
> 2 billion pixels (or dots) would be only 8 square inches of image. While
> A4 scanners working at 9600dpi (1-bit depth) would result in an image of
> some 9 billion pixels. So 2 billion pixels is not really that far out of
> the ball-park.

I was talking about a /dimension/ of 2 billion - that is, a width or
height of 2 billion. Not an /area/ of 2 billion - such resolutions only
occur in very niche situations, but they are not inconceivable. At the
extraordinary high resolution used by your typesetter (which is a lot
higher than any printing process produces - you are talking about 0.16µm
dots), 64-bit types are enough to handle a printout that is 3 km square.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>Thu, 2 May 2019 16:51:41 +0200

David Brown <david.brown@hesbynett.no>
Thu, 2 May 2019 16:51:41 +0200