Re: Optimization techniques and undefined behavior

David Brown <david.brown@hesbynett.no>
Mon, 29 Apr 2019 17:08:13 +0200

          From comp.compilers

Related articles
Re: Optimization techniques david.brown@hesbynett.no (David Brown) (2019-04-25)
Re: Optimization techniques 847-115-0292@kylheku.com (Kaz Kylheku) (2019-04-26)
Re: Optimization techniques david.brown@hesbynett.no (David Brown) (2019-04-28)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-29)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-04-29)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-29)
Re: Optimization techniques and undefined behavior auriocus@gmx.de (Christian Gollwitzer) (2019-04-29)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-04-29)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-30)
Re: Optimization techniques and undefined behavior david.brown@hesbynett.no (David Brown) (2019-04-30)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-01)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-01)
[56 later articles]
| List of all articles for this month |

From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.compilers
Date: Mon, 29 Apr 2019 17:08:13 +0200
Organization: A noiseless patient Spider
References: <72d208c9-169f-155c-5e73-9ca74f78e390@gkc.org.uk> 19-04-021 19-04-023 19-04-037 19-04-039
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="88917"; mail-complaints-to="abuse@iecc.com"
Keywords: optimize, debug
Posted-Date: 29 Apr 2019 11:36:59 EDT
Content-Language: en-GB

On 29/04/2019 01:31, Bart wrote:
> On 28/04/2019 22:49, David Brown wrote:
>> On 26/04/2019 02:18, Kaz Kylheku wrote:
>
>>> Problem is, all these propositions are not safe; they are based on
>>> "the program has no mistake".
>>
>>
>> /All/ programming is based on the principle of "the program has no
>> mistake".  It is absurd to single out something like this as though it
>> is a special case.
>>
>> If I want to double a number, and write "x * 3" by mistake, it is a bug.
>
> Yes, a bug that will probably give the wrong answer. But I want a
> predictably wrong answer - see below.
>
>
>>   If I do arithmetic on signed integers, and they overflow, it is a bug.
>>   The compiler is allowed to assume that when I write "x * 3", I mean
>> what the C language means - multiply x by 3.  It is also allowed to
>> assume that when I write some signed arithmetic, I mean what the C
>> language means - give me the correct results when I give valid inputs,
>> otherwise it is "garbage in, garbage out".
>>
>> Ask yourself, when would "x * 2 / 2" /not/ be equal to 2, given two's
>> complement wrapping overflows?  It would happen when "x * 2" overflows.
>>   For example (assuming 32-bit ints), take x = 1,500,000,000.  When you
>> write "x * 2 / 2" with an optimising C compiler, the result is most
>> likely 1,500,000,000 (but there are no guarantees).
>
> If you write x=(x*2)/2 expecting a result consistent with:
>
>    int y=(x*2); x=y/2;
>
> then you don't want the compiler being clever about overflow.


I /do/ want a result consistent with a single expression, or splitting
up the expression. And I get such a consistent result - whether the
compiler does the "x = y / 2;" operation or just leaves "x" unchanged.
I really do not care what the compiler does about overflow, or how
clever it is (except in the sense that I want it to generate efficient
code). That is because I don't have any overflows.


Questions about what the compiler will do with overflows, like how
consistent it will be, are as sensible as asking how many miles per
gallon you get from your car when it has no tires. You would not drive
your car without tires - that would be a mistake, a bug in your driving
procedure. I don't write signed integer expressions that overflow -
barring bugs in my coding. And thus I don't care what the compiler does
about them, and have no interest in their consistency.


> If overflow has occurred, then you want to know about it by it giving a
> result consistent with twos complement integer overflow.
>


No, I don't - I don't care about consistency here. Any help the
compiler and tools can give me in finding my bugs is great, of course -
but consistency is not a requirement at all. The best thing my compiler
can do is give me a compile-time error - and that is certainly not
consistent with two's complement wrapping. The second best thing it can
do is through a run-time debug error when I have requested such checks -
again, that is not remotely consistent with two's complement wrapping.
At no point can I imagine that wrapping is in any way helpful in finding
bugs.


> You DON'T want the compiler giving you what looks like the right answer
> and brushing the overflow under the carpet.
>


I want the compiler to give me the right answer to valid questions - I
don't expect it to give me any consistent answer to invalid questions.


> You also want a result that is portable across compilers,


I want consistent results to valid inputs - why would I want consistent
or portable results to invalid inputs?


Have you not heard of "garbage in, garbage out" ? Why do you so
desperately want the garbage out to be a particular form of garbage?


> but I've just
> tried 20 or so combinations of compilers and optimise flags, all give a
> result of -647483648 - except gcc which gave 1500000000. And even gcc
> gave -647483648 with some versions and some optimisation levels.
>


Do you understand what "optimising compiler" means? It means the
compiler should try to give you code that runs as efficiently as
possible given valid inputs. C does not impose any requirements on code
efficiency, but compiler users certainly do - so a C compiler is not
going to go out of its way to give poorer quality code. So given "x * 2
/ 2;", a compiler will do one of two things - return "x" unchanged, or
carry out the operations using the most obvious assembly instructions.
A good compiler will thus give you 1500000000 in this case, as that is
the most efficient implementation consistent with the source code. A
weaker compiler, or a compiler without appropriate "optimise well"
options, will give -6474383648. Note, however, that this is just the
result of one operation in isolation. With this code:


void foo(int x) {
if (x < 0) return;
int y = x * 2 / 2;
if (y >= 0) {
printf("y is non-negative, value %i\n", y);
} else {
printf("y is negative, value %i\n", y);
}
}


the result for foo(1500000000) could well be :


y is non-negative, value -6474383648




> -------------------------
> #include <stdio.h>
>
> int main(void) {
>   int x=1500000000;
>
>   x=(x*2)/2;
>
>   printf("%d\n",x);
>
> }
> -------------------------
>
>
>>   When you use a
>> "two's complement signed overflow" compiler, you get -647,483,648.  Tell
>> me, in what world is that a correct answer?
>
> It's not a correct answer when you are trying to do pure arithmetic. It
> CAN be correct when doing it via a computer ALU that uses a 32-bit twos
> complement binary representation.


It is not a correct answer for standard C signed arithmetic, because
there is no correct answer. It is not a correct answer in normal
mathematics, or almost any real-world problem you might want to model.
It is, however, correct if you have defined your signed arithmetic to be
wrapping. It is fine - but IMHO almost entirely useless and
counter-productive - for a programming language to define signed
arithmetic in that way. C does not define it that way, but other
languages (and particular C compilers) can do so.


>
> It is certainly what you might expect on such hardware.
>
>> Why do you think a guaranteed wrong and meaningless answer is
>> better than undefined behaviour?
>
> Is it really meaningless? Try the above using x=x*2. It will still
> overflow and produce a result of -1294967296, apparently incorrect. But
> print the same bit-pattern using "%u" format, and you get 3000000000 -
> the right answer. You can predict what's going to happen, /if/ you can
> predict what a compiler is going to do. Unfortunately with ones like
> gcc, you can't.


Again, it does not matter if you can predict what value you get if
something is nonsensical. It matters that you can predict what values
are valid inputs, of course, but not what the outputs are when the
inputs are invalid. When garbage goes in, I don't care what colour of
garbage comes out - if I care what is going to come out, I am careful
about what I put in.


If I want to see a result of 3000000000 when viewed as unsigned types,
then I'll use unsigned types in the first place.


>
>> Ask yourself, when would "x + 1 > x" not be true?  When x is INT_MAX and
>> you have wrapping, x + 1 is INT_MIN.  Ask yourself, is that the clearest
>> and best way to check for that condition - rather than writing "x ==
>> INT_MAX" ?  When does it make sense to take a large positive integer,
>> add 1, and get a large /negative/ integer?
>
> You get funny things happening with unsigned integers too; try:
>
> -------------------
>   #include <stdio.h>
>   int main(void) {
>     unsigned int x=1500000000;
>
>     x=(x*4)/4;
>
>     printf("%u\n",x);
>   }
> -------------------
>
> This displays 426258176 rather than 1500000000.
>


Yes, agreed. Unsigned integers don't model mathematical integers, they
model a modulo arithmetic. If you don't think in terms of modulo,
"funny things" happen. If your clock shows half past ten, and you wait
11 hours, it now shows half past nine - funny things have happened.
That is modulo arithmetic for you.


> So why is a 'wrong and meaningless answer' OK, in this case, just
> because C deems it to be defined behaviour?
>


It is not meaningless - precisely because it has a defined behaviour to
give it meaning. It is often an inappropriate result, and a poor model
for real problems - unsigned overflow is often a bug as well. (I have
said before that I'd actually like it to be undefined, but it is
important that the language has some form of wrapping arithmetic to be
used on the few occasions when wrapping is useful.)


> This marginalisation of signed integer overflows is out-dated, now that
> every relevant machine is going to behave the same way.
>


Like many people, you misunderstand. C does not make signed overflow
undefined behaviour because it supports non-two's complement systems.
If that were the case, then it would have made the behaviour
implementation-defined. Signed overflow is /intentionally/ undefined
behaviour because there is no sensible definition to use, and because
there are advantages in optimisation and error checking in leaving it
undefined.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.