Language issues (was: Compiler issues)

"Joachim Durchholz" <joachim_d@gmx.de>
11 Nov 2000 10:07:51 -0500

From comp.compilers

Related articles
New Book: The School of Niklaus Wirth webmaster@mkp.com (2000-10-31)
Re: New Book: The School of Niklaus Wirth smoleski@surakware.com (Sebastian Moleski) (2000-11-01)
Re: Re: New Book: The School of Niklaus Wirth mikael@pobox.com (Mikael Lyngvig) (2000-11-04)
Re: Re: New Book: The School of Niklaus Wirth ollanes@pobox.com (Orlando Llanes) (2000-11-05)
Re: Re: New Book: The School of Niklaus Wirth jparis11@home.com (Jean Pariseau) (2000-11-07)
Re: Compiler issues... (was Re: New Book: The School of Niklaus Wirth) ollanes@pobox.com (Orlando Llanes) (2000-11-09)
*Language issues (was: Compiler issues) joachim_d@gmx.de (Joachim Durchholz)* (2000-11-11)**
Re: Language issues (was: Compiler issues) rbw3@cet.nau.edu (Brock) (2000-11-14)

| List of all articles for this month |

From:	"Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups:	comp.compilers
Date:	11 Nov 2000 10:07:51 -0500
Organization:	Compilers Central
References:	00-10-227 00-11-019 00-11-024 00-11-046 00-11-062 00-11-065
Keywords:	OOP, design
Posted-Date:	11 Nov 2000 10:07:51 EST

Orlando Llanes <ollanes@pobox.com> wrote:
> Your example does not show the need for multiple constructors, you
> can accomplish the same thing with just one constructor, and an
> overloaded function. Pascal does not allow overloading a function
> (especially constructors) with the same name, unless you're using a
> Borland Delphi compatible compiler.

Overloaded constructors *are* multiple constructors. The declared
parameter types are used as part of the name, that's all.

And I don't understand why basing the distinction between different
construction methods on the parameter types should be superior to basing
them on different constructor names. When I create a Complex object, I
need two constructors:
    cartesian_complex (real real_part, imaginary_part)
    polar_complex (real absolute_value, angle)
In a language that doesn't allow me this, I have to write a constructor
with an artificial parameter as in
    complex (real param1, param2; enum construction_mode)
    -- If 'construction_mode' is 'cartesian',
    -- 'param1' and 'param2' are interpreted
    -- as real part and imaginary part, resp.
    -- If 'construction_mode' is 'polar',
    -- 'param1' and 'param2' are interpreted
    -- as absolute value and angle.
Or I have to revert to two-step construction:
    complex
    -- Return an empty object
    set_from_cartesian (real real_part, imaginary_part)
    set_from_polar (real absolute_value, angle)
For 'complex', this works well enough, but if the internal structure of
the object is more complex, this two-step construction means that the
object has two states: Initialized (as it comes from the constructor)
and Usable (after it has been subjected to a second-step routine). Which
means that *all* routines that accept an object of that type must first
check the state that the object is in, and throw an exception (or do
something else) if it isn't in Usable state.
This all strikes me as a lot of needless complication just to have the
(admittedly nice) property that type names are also constructor names.

And for callers, the difference is between
    cartesian_complex (0.0, 0.0)
and
    complex (0.0, 0.0, cartesian)
which doesn't make much of a difference in key count but still has the
feel of a kludge (in my personal opinion, it's partly a matter of
style), and
    x = complex()
    x.set_from_cartesian (0.0, 0.0)

> ==========================
> From: "jt" <gkt37@dial.pipex.com>
> > I have to disagree. Pascal more clearly distinguishes between
> > ...
> > without getting confused.
>
> That's not true, Pascal changes the way you think.

As does C, or any other language. In fact *programming* changes the way
that you think. Or any other mental endeavour that requires your full
concentration over a prolonged period of time.

> IMO Pascal represents logic while C represents freedom. I know from
> experience, and I know people who would agree with me.

There's some truth to that.

However, nothing would have prevented a language that combines logic and
freedom. I don't like the restrictiveness of Pascal (well, hardly
Wirth's fault if people try to use his language for real projects), but
I don't like the ad-hockery that pervades C (preprocessor, type and
variable declaration syntax, superfluous delimiters for all control
structures, plus general syntactic irregularity - ask any C parser
author about the latter).
And if you add pointer arithmetic and function addresses to Pascal, and
add some minimal module support, you do have C semantics, so it's not
difficult to do. (One might also clean up the syntax a bit, but that's a
style question: { is exactly as may keypresses as "do".)
Well, it wasn't done, and a modern language would need a lot of other
things to gain attention, so that opportunity is gone...

> I know of someone who learned BASIC that said it
> made C hard to understand.

Basic makes many languages hard to understand. I'm firmly with Dijkstra
here: "It is practically impossible to teach good programming to
students that have had a prior exposure to BASIC: as potential
programmers they are mentally mutilated beyond hope of regeneration."
(Well, that's polemic exaggeration, but there's a lot of truth to that.)

> My point is that in
> C++ you can instantiate a class by declaring a variable (not a pointer
> to the class, but an actual class), and you can call any of its
> members counting on the fact that they're present and ready to use
> because the constructor was called automatically. In Pascal your
> program will crash, raise an exception,
> or cause a run-time error if you use a virtual function without first
> calling the constructor. In other words, C++ automatically sets up the
> class, but you have to do it yourself in Pascal. Not only that, but if
> you have to manually run the constructor, it's one more line of code
> that the programmer has to know about.

I think both OPascal and C++ got one thing wrong: they made the setup of
VMTs part of the constructor.
I've been programming in Eiffel for a while, and there, object
construction is a two-step process: First, the run-time sets up the VMT
behind the scenes; only after that the user-written constructor code is
run.
This has many advantages:
* It allows the use of virtual functions in the constructor (as already
observed by VbDis). If this isn't done, I cannot use the advantages of
virtual functions in the construct, and I have to revert to two-step
construction (which is a Very Bad Thing, more on this below).
* It requires a labyrinthine set of rules to determine what happens in
the case of multiple inheritance. The rules get even worse in the
presence of repeated inheritance (the "diamond inheritance" case). It's
no surprise that multiple inheritance is considered obscure and
best-to-be-avoided in C++ with that, and a where's-the-problem issue in
Eiffel!
* Constructors lose much of their special status. If you want to
reinitialize an existing object, just call the constructor again - the
language knows it doesn't need to setup the VMTs again and calls the
constructor with a fully ordinary subroutine call without any ado.
* If the implementation in a subclass is drastically different from that
of the superclass, the constructor often does fully unnecessary things.
This places gratuituous overhead on object creation.
* The constructor can call it's superclass' constructor at any point
during initialization: before, in-between, after, or not at all. This is
partly just a consequence of working virtual functions (the subclass may
have overriddedn some functions in ways that need specific
initializations), but it gives you additional expressive power as well.

In fact an Eiffel constructor is a rather ordinary subroutine. It can be
called for an existing object (in that case, no object construction will
take place, the object will simply be subjected to the programmer's code
in the constructor). It can be called from subclass code. It can be
overridden. The constructor property isn't even inherited; what's a
constructor in FOO is an entirely ordinary function in CHILD_OF_FOO
unless CHILD_OF_FOO explicitly declares it to be a constructor as well.

Of course, there's still a difference between constructors and ordinary
functions:
1. Constructors must be declared as such in the class code.
2. All routines may assume that the class invariant holds upon routine
entry. Constructor routines cannot rely on that. (Technically, class
invariants are evaluated at routine entry and and exception is thrown if
they don't hold.)

In languages where the invariants are not checked, one *could* program
otherwise, but I think it's generally a good idea to establish the class
invariant, else you'll get that additional state checking in.

> I've not tried Rexx but I've glanced at its syntax and didn't like
> what I saw.

If you reject languages based solely on syntax, then you're probably
stuck with whatever language you're currently using. (Unless the syntax
is so ugly that it seriously hampers readability, but your remark
doesn't seem to indicate that.)

> I'm creating a script language which has the ability to allow
> several instances in memory without hogging up all the computer's
> memory. One in which most table and list sizes are not hard coded
> (the only ones so far whose sizes are hard coded are the VM opcode
> function pointer table, and the scanner character code table). It
> also has to parse and run with an extremely limited amount of
> memory.

One word of warning: It's very easy to design a language that fulfils a
small set of design goals. Optimizing for space usage at the expense of
everything else will create a language that runs with minimal memory
consumption but has more-or-less accidental properties in other
respects.
E.g. I know it's extremely difficult to have a small interpreter that
also runs the programs at good speed. It's difficult to make multiple
inheritance languages fast *and* load dynamically. It's difficult to
make genericity (type parameters/templates) powerful without making it
undecidable. It's difficult to have both dynamic dispatch and
well-working binary operators (actually that's still a research issue;
most current-day languages have special rules for arithmetic operators
and leave you pretty much in the cold for user-defined types). It's
difficult to have a minimally-sized language that doesn't need megabytes
of libraries to work well for practical problems (Lisp comes to mind).
Etc. etc. - there's a heap of design trade-offs, and concentrating on
memory footprint at the exclusion of other goals will give you just
another monomanic language. Unfortunately, it's an extremely common
mistake to make; I've seen many such approaches fail - not technically,
but socially: nobody (except its creator) was interested.
Of course, such a project is a good thing to have on your CV. And making
it run the way *you* want can give deep satisfaction, independently of
whether others are interested.
In short: Just keep an open mind and remember that memory footprint is
important but not everything.

Regards,
Joachim

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Language issues (was: Compiler issues)

"Joachim Durchholz" <joachim_d@gmx.de>11 Nov 2000 10:07:51 -0500

"Joachim Durchholz" <joachim_d@gmx.de>
11 Nov 2000 10:07:51 -0500