Re: behavior-preserving optimization in C, was compiler bugs

George Neuner <gneuner2@comcast.net>
Sun, 24 May 2009 04:55:55 -0400

          From comp.compilers

Related articles
[21 earlier articles]
Re: behavior-preserving optimization in C, was compiler bugs anton@mips.complang.tuwien.ac.at (2009-05-21)
Re: behavior-preserving optimization in C, was compiler bugs anton@mips.complang.tuwien.ac.at (2009-05-21)
Re: behavior-preserving optimization in C, was compiler bugs anton@mips.complang.tuwien.ac.at (2009-05-21)
Re: behavior-preserving optimization in C, was compiler bugs anton@mips.complang.tuwien.ac.at (2009-05-21)
Re: behavior-preserving optimization in C, was compiler bugs bear@sonic.net (Ray) (2009-05-21)
Re: behavior-preserving optimization in C, was compiler bugs Jan.Vorbrueggen@thomson.net (=?ISO-8859-15?Q?Jan_Vorbr=FCggen?=) (2009-05-22)
Re: behavior-preserving optimization in C, was compiler bugs gneuner2@comcast.net (George Neuner) (2009-05-24)
Re: behavior-preserving optimization in C, was compiler bugs DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs anton@mips.complang.tuwien.ac.at (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs ian@airs.com (Ian Lance Taylor) (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs gneuner2@comcast.net (George Neuner) (2009-05-25)
Re: behavior-preserving optimization in C, was compiler bugs DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-05-29)
Re: behavior-preserving optimization in C, was compiler bugs dave.thompson2@verizon.net (David Thompson) (2009-06-15)
| List of all articles for this month |

From: George Neuner <gneuner2@comcast.net>
Newsgroups: comp.compilers
Date: Sun, 24 May 2009 04:55:55 -0400
Organization: A noiseless patient Spider
References: 09-04-072 09-04-086 09-05-010 09-05-022 09-05-028 09-05-038 09-05-039 09-05-050 09-05-055 09-05-065 09-05-069 09-05-073 09-05-087
Keywords: optimize, linker
Posted-Date: 24 May 2009 19:43:12 EDT

On Tue, 19 May 2009 14:42:40 -0400, George Neuner
<gneuner2@comcast.net> wrote:


>On Sat, 16 May 2009 08:39:28 +0100, Nathaniel McIntosh
><mcintosh@cup.hp.com> wrote:
>
>>
>> foo.c bar.c
>> ----- -----
>> ... ...
>> double x; int x = 0;
>> ... int garbage;
>>
>>Most C compilers will compile these two modules without complaint;
>>when you link foo.o and bar.o into an executable, the "strong"
>>definition of "x" from bar.o is favored over the "weak" definition in
>>foo.o, and you wind up with a final "x" entity that is 4 bytes in size
>>(assuming an ILP32 compilation model), not 8 bytes.
>>
>>In spite of the fact that foo.c contains functions that store 8-byte
>>values to "x", the program works without optimization because the
>>variable "garbage" (unused as it turned out) happens to be allocated
>>just after "x" in memory. If the optimizer plays with the storage
>>allocation such that some other critical variable appears just after
>>"x", then the application fails.
>
>No. Your example will work regardless. Assuming those definitions
>were global in the source files, the result will be 2 different
>variables named 'x' - an integer in bar.o and a double real in foo.o.
>
>If, OTOH, the code was:
>
> foo.c bar.c
> ----- -----
> ... ...
> extern double x; int x = 0;
> ... int garbage;
> ...
>
>then the compiler/linker would create only the integer "x" and any
>functions in foo that accessed "x" as a double real would do so in
>error.


Nathaniel asked me privately what compilers I know of that act as I
described above. He cited a few compilers that act as he described. I
was going to reply privately, but in checking compilers I discovered
something that, I think, dovetails nicely with this thread.


YMMV.




I have access to more than a dozen ANSI C compilers - some C89 and
some C99 . I have multiple versions of GCC (for x86 Linux, Windows
and QNX, and for 68K VxWorks), of MS VisualC(++), and of TurboC. I
also have available to me a Sparc5 C/C++ compiler, 68K Macintosh MPW
C, and a late model extended DOS version of x86 Watcom (the old
for-profit Watcom).


I tested them by compiling the following code (in 32-bits) in both
debug and optimized versions.


        ==== test.c ====
        extern void DoStuffWithIntX( void );
        extern void DoStuffWithDblX( void );


        int main(int argc, char* argv[])
        {
            DoStuffWithDblX( );
            DoStuffWithIntX( );
            return 0;
        }


        ==== foo.c ====
        #include <stdio.h>
        double x;


        void DoStuffWithDblX( void )
        {
            printf( "double X is at %p\n", &x );
        }


        ==== bar.c ====
        #include <stdio.h>
        int x;
        int garbage;


        void DoStuffWithIntX( void )
        {
            printf( "integer X is at %p\n", &x );
        }


and I varied which of the variables 'x' was explicitly initialized (to
zero) at the top level. I had the compilers produce map files to
check whether and where the names had storage allocated. I discovered
that some of the compilers behaved very differently depending on
whether one or both of the variables was initialized. Some of them
also behaved differently depending on whether the compilation was
debug or optimized. I won't bother with extensive reporting of which
compiler did what - suffice to say that depending on the compiler and
its settings, I was able to achieve allocation of 1 variable, 2
variables, and _NO_ variables. Except in the cases where no storage
was allocated, I was able to run the resultant executables and see the
address(es) of the variables.




I was intrigued by the wide variation in behavior, so I investigated
further. After some careful reading of the ANSI C standards - both
the original C89 and the new C99 - I discovered that there is a
unresolved ambiguity pertaining to multiple top level variable
definitions (as in Nathaniel's code above) that do not specify a
storage class.


C does not allow either duplicate definitions or multiple definitions
of the same named object in any name overloading class (of which the
"top level", ie. global names, is one). You can have duplicated
_declarations_, but not duplicated _definitions_.


      "extern double x;" is a declaration. No storage is allocated
                                                for the variable, its name is a reference
                                                to a definition elsewhere.


      "double x = 0.0;" is a definition. This allocates storage
                                                for the variable. Because it is initialized,
                                                this is considered a "strong" definition.


      "double x;" is a definition if there is a preceeding
                                                declaration, else it is both. Because it
                                                is _not_ initialized, this is considered
                                                a "weak" definition.


      "static double x;" is a declaration if there is a subsequent
                                                strong definition, else it is both. This
                                                allocates storage for the variable.




A problem occurs when there are multiple definitions in separate
files. Technically, by the standards, there can be only 1 definition
of a name at top level ... all other references to the name, including
any forward references to the name in the same file, *must* be
"extern" declarations.


However, the requirement for specifying "extern" is routinely relaxed
for forward references to functions and to recursive structures. For
such declarations, compilers implicitly assume "extern". It appears
that some compilers also assume "extern" in the case of weak variable
definitions.


Now the problem with assuming "extern" for weak definitions is that it
in conflict with the assumption of file scope visibility. Judging by
the C standard, a top level variable definition (strong or weak) is
not meant to name an external object, but rather to introduce a
variable with file scope ... a separate "extern" declaration is needed
to "import" the name into a different scope. In keeping with the
principle of least surprise, it would be better to assume a weak
variable definition is "conditionally static" rather than "extern",
conditionally allocate storage for it and have the linker resolve the
references. Some of the compilers actually did something like this
and created 2 separate variables.


Every compiler complained about multiple definitions of 'x' when all
the definitions were strong - ie. when 'x' was initialized in both
files. However, if either definition was weak, I was able to build an
executable. When both definitions were weak, however, some of the
compilers produced optimized executables with NO storage at all
allocated for the name 'x'.


I know this isn't a C forum and nobody (including me) cares much to
debate what is right or wrong about C. But for the sake of a
discussion about compilers, how would you handle the situation if it
were up to you?


George


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.