Data Structure Reorganizing Optimizations
Mon, 31 Oct 1994 22:20:51 GMT

          From comp.compilers

Related articles
CProf cache profiling system available (1994-10-13)
Data Structure Reorganizing Optimizations [Was: Re: CProf cache profil (1994-10-19)
Re: Data Structure Reorganizing Optimizations (1994-10-22)
Re: Data Structure Reorganizing Optimizations robertb@HING.LCS.MIT.EDU (1994-10-26)
Re: Data Structure Reorganizing Optimizations johnm@cory.EECS.Berkeley.EDU (1994-10-22)
Re: Data Structure Reorganizing Optimizations (1994-10-27)
Re: Data Structure Reorganizing Optimizations (1994-10-23)
Data Structure Reorganizing Optimizations (1994-10-31)
Re: Data Structure Reorganizing Optimizations (1994-10-31)
Re: Data Structure Reorganizing Optimizations (1994-10-31)
Re: Data Structure Reorganizing Optimizations (1994-11-01)
Re: Data Structure Reorganizing Optimizations (1994-10-31)
Re: Data Structure Reorganizing Optimizations (1994-11-02)
Re: Data Structure Reorganizing Optimizations (1994-11-02)
[13 later articles]
| List of all articles for this month |

Newsgroups: comp.arch,comp.compilers
Keywords: optimize, C
Organization: Rutgers University Department of Computer Science
References: 94-10-108 94-10-141
Date: Mon, 31 Oct 1994 22:20:51 GMT

| I'd like to start a discussion of why ... data structure
| reorganization optimizations [in C] should still be illegal.
| I am aware of the K&R rule that structure elements be in increasing
| order of address. But, since the contiguity requirement has been
| dropped, and since different machines have different padding rules,
| what value does the address ordering rule have? Even network header
| code cannot be written portably using structures to access data
| structures placed in memory from DMA devices.
| I believe that C's rules about data structure organization are
| obsolete. Certainly, they are not in the ken of the usual programmer
| using C or C++. Why not permit compilers to do such "illegal" data
| structure reorganizations, bringing the performance gains Lebeck and
| Wood describes to the standard application?
| I suggest that this be considered by future X3J11 ANSI C standards (if
| any such revisions are forthcoming).

I agree absolutely that the rules are obsolete. In fact, they were obsolete
the day they were written.

The real underlying problem is that a struct can be interpreted in two
different ways:

1. As an abstract data type, on which a very limited list of
operators are defined: Field selection, assignment,
pass by value, return by value, initialization of fields
in the order of their appearance in the definition. Re-
organization has no effect on any of these.

2. As a way of laying out data in a particular pattern in memory.
Obviously, re-organization kills this.

The problem with C is that it's never really properly supported *either* of
these interpretations. The unspecified padding makes it impossible to write
truely portable descriptions of memory layouts. However, the various hack
techniques that C programmers have come to love and rely on over the years
make a joke of the ADT description. In particular, the C standard enshrined
in the spec two requirements:

a) The address of a structure must be identical with the address of
its first member;
b) If two structures have a leading prefix of fields of the same
type, then the offsets of those fields from the beginning
of the structure must be identical in both. This usually
shows up as a poor-man's kind of subtyping: One structure
defines just the leading fields, another starts with the same
leading fields and adds additional ones at the end. It must
be possible to cast a pointer to use pointers to either of
these structure types essentially interchangeably to refer
to the common leading fields.

While there is code around that relies on requirement (a), requirement (b) is
the real killer, since it shows up all over the place as an extension
mechanism: It allows you add new fields to the end of a struct with the
guarantee that old code that simply accesses the original fields will continue
to work. Given the limited capabilities of most C compilation systems to
track dependencies, code using the old definition can stick around for a long
time. You'd just better hope that it doesn't *allocate* any of the older,
shorter structures - though if you are careful to include an explicit length
in the structure when you first define, you can even get away with that.

This requirement is used to justify other coding techniques as well. For
example, a compiler often needs to build a tree with different kinds of nodes,
of perhaps very different sizes. By defining a common header containing a
type code and perhaps a length, one can allocate exactly the space needed for
each node. Further, one can even use variable-length nodes by relying on the
fact that C won't check array bounds, so if the final field is an array, one
can make its size vary. (Actually, current interpretations of the ANSI stand-
ard appear to make this last technique illegal. However, it's so widely used
that I'd expect any compiler that didn't support it, standard or no, to get
quickly shot down on a "quality of implementation" complaint.)

Given the pervasiveness of this kind of coding in existing C code, there is
absolutely no hope of ever changing the requirements for the existing "struct"
construction. However, one could perhaps come up with a new kind of structure
that the compiler *is* free to re-arrange. While it sounds a bit stilted,
perhaps using "struct volatile tag { type field1; ...}" would work. Program-
mers who wanted to use all their struct's into adt's could do a

#define struct struct volatile

-- Jerry

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.