Re: Definable operators

Craig Burley <>
3 Apr 1997 14:07:27 -0500

          From comp.compilers

Related articles
[7 earlier articles]
Re: Definable operators (Tony Finch) (1997-03-23)
Re: Definable operators (Dave Lloyd) (1997-03-27)
Re: Definable operators (Henry Spencer) (1997-03-31)
Re: Definable operators (1997-03-31)
Re: Definable operators (1997-04-02)
Re: Definable operators (Dave Lloyd) (1997-04-02)
Re: Definable operators (Craig Burley) (1997-04-03)
Re: Definable operators (Francois-Rene Rideau) (1997-04-03)
Re: Definable operators (Jerry Leichter) (1997-04-06)
Re: Definable operators (1997-04-11)
Re: Definable operators (1997-04-16)
Re: Definable operators (Matthew J. Raw) (1997-04-16)
Re: Definable operators (1997-04-16)
[25 later articles]
| List of all articles for this month |

From: Craig Burley <>
Newsgroups: comp.compilers,comp.lang.misc
Date: 3 Apr 1997 14:07:27 -0500
Organization: Free Software Foundation, 545 Tech Square, Cambridge, MA 02139
References: 97-03-037 97-03-076 97-03-112 97-03-115 97-03-141 97-03-162 97-03-184
Keywords: design, syntax (Seth LaForge) writes:

      int + string should mean nothing. I meant to mention this earlier: I
      think operator overloading is a fine thing (as well as overloading in
      general), but when combined with implicit conversions it becomes a
      nightmare. In a language with no implicit conversions, string +
      string is not too bad, since it's clear what it means.

It's not clear to everyone what it means. Why wouldn't

    "1" + "2"

evaluate to


for example?

Many's the time I've wanted the convenience of expressing the
concatenation of bitstrings, which in the languages I use are
typically represented as numbers.

So, I view useful syntactic operators as those that simply and
powerfully express concepts that are largely free of context. For
example, you shouldn't have to know the specific types of operands to
know how an expression involving them is parsed, how its precedence
works, and what the expression generally _means_ (even if, without
those specific types, you might know few details about how it's

The concept of + always meaning the mathematical sense of addition is
useful -- not just in the sense that "addition is useful", but that "+
means _only_ addition" is useful in designing programming languages.
In point of fact, "+ means _only_ addition" is _substantially_ more
useful as a language feature than "+ means whatever the programmer
wants it to mean". The latter is useful in a tool, though too many
tools define poorly-thought-out languages IMO, but if you're designing
a tool syntax with an "anything-goes" attitude, you're not designing a
good language. (Ref. TECO. ;-)

The mistake made by designers of overloading constructs is that they
observe that + means one thing for int, another for float, yet another
for double, and conclude that it therefore could mean anything and
nobody would be too confused, because they somehow cope with + meaning
so many different things already.

Their initial observation is at least as wrong as their conclusion.
In C, + (as a binary operator) means addition, and only addition. The
interpretation of the data being added changes, so adding 32-bit
quantities can mean "treat them as two's-complement integers" or
"treat them as IEEE 754 single-precision floats", or whatever, but the
crucial observation is that + means "add".

I'm not saying it's totally out of the question that + be made to mean
something other than add in certain situations.

What I am saying is that, the more + can mean, the _less_ the language
in question has to offer for the meaning of +.

On the other hand, the more + can mean, the _more_ the _tool_ that
processes the language in question has to offer for the meaning of +.

Generally, I suggest language designers focus on _minimizing_ the
valid expressive "space" of their languages to use those constructs
that are simple to explain, mean the same things with the least
context necessary (e.g. avoid C's mistake leading to the ambiguity of
"(a) - (b)"), and conceptually scale well beyond the implementations
they might have in mind. (Not to mention I think such designers
should minimize the expressive space to make it difficult for common
typos and thinkos to lead to syntactically correct code assuming
current approaches to entering code in the languages are likely to
predominate, etc.)

E.g. think not just about 32-bit ints and floats, but about
infinite-precision or truly "real" numbers, about complex numbers,
etc. For example, that kind of thinking leads to the realization that
having "1" + "2" == "3" [in C-speak, ignoring the fact that strcmp()
is really more appropriate for notational convenience ;-] is _more_
natural than having "1" + "2" == "12"...because C strings could be
reasonable holders of extremely large-precision values, even more
general expressions of constants such as "2" * "pi", or "sqrt(2.)".

Once you've imagined these more-general usages for the language, you
don't have to implement them or even define them -- just allow
expressive "space" for them in the natural ways.

Another example of a failure to do this in C could be the comma
"operator". Assuming early C always had multiple arguments to
procedures separated by commas, perhaps it made good sense for comma
to be a separator of items in an ordered, but not _sequenced_, list.
Then, to "overload" that syntactic item with the meaning of "sequence
point" in some contexts was a mistake -- the decision probably was
based on the (erroneous) assumption that those contexts would never
need the generic "item separator" feature of the function-call comma
_and_ that users would always have an easy time distinguishing, by
context, which meaning of comma was meant.

Both of these assumptions proved false. I'm sure not a few
programmers have tried (or wisely realized they couldn't and wished
they could) write things like "case 1,2,3:", or add syntax like
"a[i,j,k] = ...;" to the language -- and I know for certain people
(like myself ;-) have been bitten by the fact that comma had multiple

Some linguistically-sensitive thought about this might have made
things quite a bit smoother. E.g. perhaps once semicolon was decided
upon as the statement separator, and as the sequence statement
separator as well, it, instead of comma, could have been used within
expressions to denote a sequence point. Such expressions might well
have then needed special surrounding syntax to specify the ; as
meaning not a statement separator, just a sequence point, but that in
turn might well have led to an easier-to-understand syntax.

(Of course this is all 20-20 hindsight, actually probably not even
that clear, but the point is that even "modern" languages like C have
linguistically bad aspects to them, but it seems no language
influences modern ad-hoc language design more than C, so the more we
remind people that C has these mistakes, the better. ;-)

Getting back to the point of how to use the available expressive space
of a language (that is, how many varieties of meaning can be expressed
given a fixed number of tokens)...'s only the designers of interfaces that are used entirely (or
essentially entirely) by machines who should be thinking about maximum
packing of the expressive space of the language. (Examples include
the instruction set of a CPU chip, now that few people write code in
machine language, and the compressed form of a human-readable

So, if you think you're designing a language, or even if you don't but
you are (if you're designing a syntax that will have writings in it
that are frequently read, or written, by humans), DO NOT pat yourself
on the back every time you compress more "power" or "meaning" into a
given number of characters. You're probably doing the exact opposite
of what you should be doing. That's too general, of course, but it's
a better starting point than what often happens in real life.
"Practice random senselessness and act kind of beautiful."
James Craig Burley, Software Craftsperson

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.