Re: Definable operators

Craig Burley <burley@tweedledumb.cygnus.com>
18 Apr 1997 01:07:57 -0400

          From comp.compilers

Related articles
[16 earlier articles]
Re: Definable operators hrubin@stat.purdue.edu (1997-04-11)
Re: Definable operators nmm1@cus.cam.ac.uk (1997-04-16)
Re: Definable operators raw@math.wisc.edu (Matthew J. Raw) (1997-04-16)
Re: Definable operators dlester@cs.man.ac.uk (1997-04-16)
Re: Definable operators fanf@lspace.org (Tony Finch) (1997-04-18)
Re: Definable operators monnier+/news/comp/compilers@tequila.cs.yale.edu (Stefan Monnier) (1997-04-18)
Re: Definable operators burley@tweedledumb.cygnus.com (Craig Burley) (1997-04-18)
Re: Definable operators apardon@rc4.vub.ac.be (1997-04-20)
Re: Definable operators genew@vip.net (1997-04-20)
Re: Definable operators kumo@intercenter.net (David Rush) (1997-04-20)
Unary operators, was Re: Definable operators fanf@lspace.org (Tony Finch) (1997-04-22)
Re: Definable operators burley@tweedledumb.cygnus.com (Craig Burley) (1997-04-22)
Re: Definable operators burley@tweedledumb.cygnus.com (Craig Burley) (1997-04-30)
[18 later articles]
| List of all articles for this month |

From: Craig Burley <burley@tweedledumb.cygnus.com>
Newsgroups: comp.compilers
Date: 18 Apr 1997 01:07:57 -0400
Organization: Cygnus Support
References: 97-03-037 97-03-076 97-03-112 97-03-115 97-03-141 97-03-162 97-03-184 97-04-027 97-04-095
Keywords: design, semantics

Craig Burley <burley@gnu.ai.mit.edu> writes:
> > It's not clear to everyone what it means. Why wouldn't
> >
> > "1" + "2"
> >
> > evaluate to
> >
> > "3"
> >
> > for example?


dlester@cs.man.ac.uk (David Lester) writes:
> What appears to be missing in the discussion so far is that Haskell
> does not encourage the use of `+' in isolation: it comes as part of
> a class definition.
    [...]
> Let us now suppose that -- despite all the signals to the contrary --
> you decide to use `+' as concatenation for strings. Here's what you do:
    [...]
> Now we can write:
>
> "1" + "2" and get "12"
>
> If you want "1" + "2" to be "3" then a more sophisticated function
> will be needed on the righthand side of the defining equation for (+).
>
> > [...]
> > The concept of + always meaning the mathematical sense of addition is
> > useful -- not just in the sense that "addition is useful", but that "+
> > means _only_ addition" is useful in designing programming languages.
> > In point of fact, "+ means _only_ addition" is _substantially_ more
> > useful as a language feature than "+ means whatever the programmer
> > wants it to mean". The latter is useful in a tool, though too many
> > tools define poorly-thought-out languages IMO, but if you're designing
> > a tool syntax with an "anything-goes" attitude, you're not designing a
> > good language. (Ref. TECO. ;-)
>
> As I mentioned above, there is a signal that you should not be using
> `+' as list concatenation, and that is that only one of the seven
> operators in the class is being defined. Clearly this does not preclude
> the possiblity that -- if you've forgotten your lithium, say -- you
> could go ahead and make `+' be concatenation; but knowledge of the
> numeric class structure is not something a naive user is likely to
> want to delve into.
>
> On the other hand, as you might imagine, the Obfuscated Haskell
> competition is truly an art form.
    [...]
> I feel that this tension is inevitable: you want flexibility to overload
> `+' in sensible ways, but you want to restrict the use of `+' only to
> sensible cases. The trouble is that `sensible' is a meta-linguistic
> concept, and it seems unlikely that we could code a compiler to check
> that overloading is only used in acceptable places. I guess that we
> disagree, in that I think that Haskell is a reasonable compromise.


I think we are agreeing here. I'm not saying there's no such thing
as a good computer language that allows overloading of + in some way --
Haskell might be an example.


I am, however, trying to encourage people who design compilers (and
therefore almost always do at least some language design, even if they
don't think of it that way -- e.g. "I'm just adding a feature people
ask for, it's not really a language-design issue") to consider the
flip side of the "coin".


The "coin" is adding expressiveness to the language that is processed
by the tool. The tool is the compiler (or the interpreter, or the
shell, or make, or the user interface, etc.). One side of the coin is
that by adding expressiveness you generally make the tool more
"powerful" in the sense that you've increased the range of what can be
expressed with a given amount of text. (E.g. "This feature allows
recursion to be implemented with a 30-character input instead of
requiring 60-character input.")


The _other_ side of the coin is that when the language becomes more
expressive, it often is made less useful as a _language_, that is,
less useful as a tool for communicating among humans.


So, whether we're talking about Haskell or Fortran or C++, it doesn't
really matter, nor does most or all of the overloading, OO,
precedence- specifying, etc. mechanisms computer linguistics tend to
go on about.


What does matter, for example, is whether the language permits + to
mean something _other_ than addition (infix, prefix, or postfix)
without _explicit_ overriding near the context of every use (such as
is done in math papers, I assume). (Note the ambiguity of terms like
"addition", "near", and so on -- the ambiguity doesn't render the
statement useless, because as long as we make sure that _humans_ take
full charge of following its advice, we ensure that the statement will
be properly understood.)


This example of the meaning of + is a good illustration, because most
everyone who is likely to read the code in the language either already
"knows" that + means, or has basically no knowledge of + and needs to
be taught a basic, widely understood meaning of it to communicate with
all the other people who are reading and writing a common pool of
code.


So, whether it's a language standard or a language community that says
"+ shall never be overloaded to mean anything other than numeric
addition; any code that doeth otherwise is considered embugged", that
statement is itself far more useful than "+ may be overloaded to mean
anything you want by <insert whatever incredibly nifty tricks your
favorite language allows>".


The result of the former statement is that the language is stronger;
the result of the latter statement is that the language is weaker,
though the tool that processes it can do more things with less input.
(But, then, "gunzip -c | bash" probably can do more still -- do you
want to write code in .sh.gz format?? ;-)


Programmers have been "taught" that doing more things with less input
is so highly valuable that they forget that the purpose of _language_
design is to make works written in the language more readily
understood by their _human_ audience. And doing that properly usually
requires _restricting_ the expressiveness of the language --
e.g. explicitly saying that + cannot be made to mean anything other
than addition, even if the tools that process the language cannot be
made to enforce such a restriction. (I used to think the best
languages were those with the highest expression/work-size ratio as
well. It's actual experience designing, modifying, and extending
languages for people that has taught me otherwise -- believe me,
nothing useful I've learned, certainly nothing I'm saying here, has
been taught me in an academic setting. It's not theory I'm espousing
-- it's what actually works in practice.)


One reason I think so many language designers focus on compact
expressiveness is that it's a good focal point for avoiding the kinds
of inelegance found in Fortran, C, and other "old" languages that made
mistakes involving lack of orthogonality. E.g. Fortran's DATA sets an
initial value for a variable; using it also implies SAVE, which
establishes the value as "static", e.g. one copy per procedure, not
invocation; which means DATA (which in some ways is quite powerful)
cannot be used to set the initial value for an automatic variable, or
a dummy argument, etc., even though it would be useful to do so.
Similarly, EXTERNAL FOO has too many possible meanings, etc.


An advantage of orthogonality is that readers of the language need
learn only a few simple, consistent meanings for symbols in that
language. So to the extent one pursues compact, expressive languages
as a means to achieve the linguistic advantages of orthagonality,
that's fine, but compactness, expressiveness, and orthagonality are
_not_ ends in themselves in language design, and pursued too far, can
lead to a compact, expressive, orthagonal language in which many
useful works (programs) are written, but cannot be easily maintained
or read.


Orthogonality becomes a disadvantage when it is broken down to,
e.g., "+, -, *, / are binary infix operators that can mean anything
you want". Still orthagonal, perhaps, but suddenly readers of the
language need to learn a lot more possible meanings (and how to
resolve them) for those symbols.


Note that essentially none of the above has anything to do with, or is
in any way addressed by, the way a particular language might make it
easier or harder to actually overload an operator, along with all the
related baggage (precedence, and so on).


Pretty much the only relevant questions for a language designer
include, for example:


    If a programmer is reading code that includes the
    expression


        "1" + "2"


    and there's no obvious specification that "+" means anything
    other than addition in the immediately surrounding context,
    and the programmer discovers that the result is, or is
    expected to be, anything other than


        "3"


    then is the code considered to be buggy?


If the answer is "yes", then that's a point in favor of the language.
If it's "no, but ..." then it doesn't matter how much you throw into
that "but" -- the question has to do with the readability of the code,
i.e. the language, not the tool, so it's a disadvantage if the code
_could_ legitimately, and basically silently, use + to mean something
other than addition.


Note that "no obvious specification" above is _not_ satisfied by
"yeah, there'd have to be an #include there somewhere", or "the code
would start with USE STRINGS", because _most_ useful code written in
that language will have #include something or USE SOMETHING -- that
doesn't inherently clue the reader in to the fact that + no longer
means addition.


(And, yes, it is basically a disadvantage that some languages permit +
to mean "addition with result being module S (where S is typically
2**N)". As programmers, we're accustomed to that, but it _is_ a
source of many bugs. So a language that doesn't even allow + to mean
modulo addition is "even better", though to be useful it might have to
include lots of other problems that make it hard for people to read
useful code written in it.)
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.