Re: Bounds checking, Optimization techniques and undefined behavior

Bart <bc@freeuk.com>
Sun, 5 May 2019 11:14:51 +0100

          From comp.compilers

Related articles
[7 earlier articles]
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-01)
Re: Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-02)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-03)
Re: Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-03)
Re: Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-03)
Re: Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-04)
Re: Bounds checking, Optimization techniques and undefined behavior bc@freeuk.com (Bart) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior gneuner2@comcast.net (George Neuner) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior gneuner2@comcast.net (George Neuner) (2019-05-05)
Re: Bounds checking, Optimization techniques and undefined behavior anw@cuboid.co.uk (Andy Walker) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2019-05-06)
Re: Bounds checking, Optimization techniques and undefined behavior christopher.f.clark@compiler-resources.com (Christopher F Clark) (2019-05-06)
[21 later articles]
| List of all articles for this month |

From: Bart <bc@freeuk.com>
Newsgroups: comp.compilers
Date: Sun, 5 May 2019 11:14:51 +0100
Organization: virginmedia.com
References: 19-04-021 19-04-023 19-04-037 19-04-039 19-04-042 19-04-044 19-04-047 19-05-004 19-05-006 19-05-016 19-05-020 19-05-024 19-05-025
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="8407"; mail-complaints-to="abuse@iecc.com"
Keywords: errors, debug, performance, comment
Posted-Date: 05 May 2019 14:01:03 EDT
In-Reply-To: 19-05-025
Content-Language: en-GB

On 04/05/2019 10:45, Andy Walker wrote:
> On 03/05/2019 23:10, Bart wrote:


>> C is just a mess; it has arrays of sorts, but people generally use raw
>> pointers without associated bounds. Maybe that's one reason why your C
>> didn't have it. Or did it somehow manage it if enabled?
>
>     This isn't really a problem with C, the language.  It's clear in
> the reference manual right back to K&R C and in the various standards
> that pointers always have associated bounds.


But how do they get there? Take this:


          int A[10], *p;
          p = &A[3];


You intend p to refer to the 4-element slice A[3..6], but how does the
language know that? How can it stop code from writing to p[5]?


Or you intend to index p[-2] to get at the preceding elements. Actually
using negative indexing is quite common, but surely all array bounds in
C are presumed to start from 0?


Or this:


      struct {int a,b,c,d;} S;


      p = &S.a;


You intend p to be used to access a,b,c,d as an int[4] array, but p's
bounds will say it's only one element long.


Or this:


        int *p = malloc(sizeof(int)*1000);


        int *q = p+400;


You are allocating one pool of memory than sub-allocating that into
smaller objects, here into a 20-element array headed by q. But how does
the language know that?


Or maybe p allocates memory for a 2D array, and q refers to one row of
that; you can't detect accesses beyond that row.


I can see how the language can figure out the overall bounds of the
block of memory a pointer refers to. But that can be a long way from
knowing the precise bounds of any array or sub-array, when represented
via a pointer to the first element.


So the problems are either that bounds transgressions can't be detected
properly as 'fat pointers' only have broad info about allocations, or
they will give false alarms when you want to do some pointer-offset
manipulations that C normally allows.


    It's just that the K&R
> compiler with Unix didn't check, perhaps for decent reasons, and that
> the relatively few attempts to produce compilers that do have not been
> all that successful.  You can't do the checking unless every pointer
> "knows" which object it is supposed to be pointing into, which rather
> implies "fat" pointers, which makes all pointer operations take longer
> and require more code.  It very likely means that arrays too need to
> be implemented differently, with proper descriptors.


Which C is not going to have without breaking just about every existing
program. Besides, most array work in C doesn't actually use arrays, but
explicit pointers.


    As discussed over
> in "comp.lang.misc" recently, that's not a Bad Thing;  it gives you
> extra facilities virtually free, as well as the added security.  But
> it does result in larger executables and slower execution -- unless you
> really do have hardware support [cf Burroughs] -- so historically it's
> not been popular.


With language support, it need have no cost. For example, suppose that
array A did carry its bounds with it (or are statically known), then in
code like this:


          for i in A do # (iterate over bounds not values)
                  A[i] := 0
          end


the compiler knows it doesn't need to bounds-check each access. Or here:


          forall x in A do # (iterate over values)
                  print x
          end


[Every pointer in C points either to a static object or to one that
comes from a handful of routines like malloc() or mmap() so it
shouldn't be too hard to do bounds checking within the allocated
object, give or take unsafe typecasts which I don't think are common
in application code. But you are certainly correct that C isn't
expressive enough to do slices or the like. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.