Re: Pointers to "why C behaves like that ?"

"Joachim Durchholz" <joachim_d@gmx.de>
24 Nov 2002 18:38:19 -0500

          From comp.compilers

Related articles
[29 earlier articles]
Re: Pointers to "why C behaves like that ?" thp@cs.ucr.edu (2002-11-24)
Re: Pointers to "why C behaves like that ?" thp@cs.ucr.edu (2002-11-24)
Re: Pointers to "why C behaves like that ?" thp@cs.ucr.edu (2002-11-24)
Re: Pointers to "why C behaves like that ?" stephen@dino.dnsalias.com (Stephen J. Bevan) (2002-11-24)
Re: Pointers to "why C behaves like that ?" cgweav@aol.com (Clayton Weaver) (2002-11-24)
Re: Pointers to "why C behaves like that ?" joachim_d@gmx.de (Joachim Durchholz) (2002-11-24)
Re: Pointers to "why C behaves like that ?" joachim_d@gmx.de (Joachim Durchholz) (2002-11-24)
Re: Pointers to "why C behaves like that ?" jacob@jacob.remcomp.fr (jacob navia) (2002-11-26)
Re: Pointers to "why C behaves like that ?" jacob@jacob.remcomp.fr (jacob navia) (2002-11-26)
Re: Pointers to "why C behaves like that ?" jacob@jacob.remcomp.fr (jacob navia) (2002-11-26)
Re: Pointers to "why C behaves like that ?" david.thompson1@worldnet.att.net (David Thompson) (2002-11-26)
Re: Pointers to "why C behaves like that ?" ajo@andrew.cmu.edu (Arthur J. O'Dwyer) (2002-11-26)
Re: Pointers to "why C behaves like that ?" thp@cs.ucr.edu (2002-11-26)
[32 later articles]
| List of all articles for this month |

From: "Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups: comp.compilers
Date: 24 Nov 2002 18:38:19 -0500
Organization: Compilers Central
References: 02-11-095 02-11-103 02-11-128
Keywords: types, design
Posted-Date: 24 Nov 2002 18:38:19 EST

jacob navia wrote:
>
> fn(a)
> {
> b = a;
> }
>
> How can you know what "a" is?


"a" is of type ANY, or Object, or Equ, or whatever happens to be the
most general type that allows comparisons for equality. Type inference
will not produce a more specific definition of "a" or "b" at this
point. Actually that's OK - there's no need to actually nail the type
down at this point.


However, if, later on, there's a code snippet like
      if a > x then fn (a) else something_else
then the type inferencer will narrow the type down to the most general
type that has equality and comparisons, maybe ruling out a few types.


Hindley-Milner typing is the most common way to set up things so that
"the most general type that has operations x, y, and z" is a unique
description of an existing type. (These types need not be declared.
There's always the sum type, i.e. "type A or type B" - equivalent to a
typesafe version of C's unions or Pascal's variant records. This
covers up the most obvious holes, and makes type inference feasible.)


> If you examine the whole program (this precludes a separate compilation
> model like C) you can find two calls to "fn" one with:
> fn(2)
> fn(6)
>
> Is the type of "b" an integer, a double, an unsigned char or a long long?
>
> All those answers are correct.


There are different answers.
Some languages say it's Integer, because you'd have to write 2. or 6. to
get a Float.
Other languages say it's a Numeric, the common supertype of Integer,
Float, and any of their higher-precision variants.
(No language says it could be char, signed or unsigned; characters
aren't Integers, multiplying a Character with a Character would be a
meaningless operation. The conflation of characters and integers in C is
one of the worst disservices that a programming languge did to the
programmer communities, IMHO.)


> Yes, that can be maybe in principle be done but with an associated
> enormous cost in compiler complexity and problems.


No. Hindley-Milner typing is simple and efficient. The algorithms is
just a page of paper (unfortunately, it's too large to fit on the
margin of this post *g*), but I have it upstairs. It's about as
complex as the most primitive dataflow analysis algorithm that one can
imagine. This is well within the reach of current-day compiler
technology.


H-M typing is just the most common (and simple) technique BTW. More
advanced type inference calculi have been invented and
test-implemented. The Girard-Reynolds calculus seems to be a bit more
powerful (i.e. allows programmers to live with less explicit
declarations) while still being well-understood and reasonably
implementable.


So far for complexity.


Problems? There are no problems. The properties of these type systems
have been explored for a decade or more. They are just as well-known
as those of the classical Pascal/C/Algol type systems. Actually, it's
OO languages where the type systems tend to have problems; I still
don't know of a single good solution to the parallel type hierarchy
problem (I know of one that *might* be good, but I haven't seen it
implemented yet, and essentially it's too close to "scrap this
subtype-by-redefinition stuff, parametric polymorphism will do more of
the job and require less tricks to extend it to a 100% solution").


> When you support several types of integers and floating point data
> like in C this kind of inference becomes extremely complex.


No.
You might need a declaration to nail down the data to a specific type,
to avoid having a descriptor attached with every piece of numeric data.
(Which is one reason why most languages with type inference don't have
so many numeric types. Often enough, it's Integer for arbitrary-length
integers and Int for machine integers, all integer literals being of
type Integer. This avoids all the hassles with correctly interpreting
-32768 on a 16-bit machine and similar borderline syndromes.)


> What do you do with arrays?
>
> Finding the expression
> table[i] = b+8.9;
>
> You can (maybe) deduce that table contains floating point data but how big
> do you dimension it?


You don't dimension it.
Or you dimension it explicitly. Most programs don't have many tables.


> Yes, of course, you dimension it to have at least "i" elements. And each
> time "i" becomes greater than the size of "table" you redimension it copying
> the data in the new table.
>
> This way you generate
>
> for (i=0; i< b;i++) {
> table[i] = i;
> }
>
> If "b" is 10 000 at run time you resize the table 10 000 times. NICE!


Pu-leeease. Take a look at modern data structures. There are ways to
implement arrays of that kind with O(log N) characteristic for
resizing, updating, accessing, and iterating arrays. (If the
difference between O(1) and O(log N) ever becomes a factor, you'll
have a thrashing machine anyway.) See Chris Okasaki, "Purely
Functional Data Structures", for a description and implementation of
an array type that has exactly these characteristics.


If you really want constant access time, or if the constant overhead
with these logN arrays is too much to bear (e.g. in weather
simulations), you just declare the array. Once. All other arrays take
their type from that master declaration, by type inference. This is
still a major win.


Type inference does not mean that you never declare a type. It means
that you can leave out 99.9% of all type declarations, without loss of
functionality or efficiency, and tremendous gain in flexibility.
(It's not uncommon to write a functional algorithm, look at the type
that the compiler infers for it, find that the type is *much* more
general than expected, and be able to reuse the algorithm for that
broader context. Of course, in most cases, an unexpected result of
type inference is a symptom of a bug.)


Regards,
Joachim


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.