Re: Using C as a back end

"Stephen T. Weeks" <sweeks@my-deja.com>
23 Oct 2000 22:16:33 -0400

          From comp.compilers

Related articles
[5 earlier articles]
Re: Using C as a back end jacob@jacob.remcomp.fr (jacob navia) (2000-10-22)
Re: Using C as a back end nr@labrador.eecs.harvard.edu (2000-10-22)
Re: Using C as a back end sjmeyer@www.tdl.com (2000-10-23)
Re: Using C as a back end lazzaro@cs.berkeley.edu (2000-10-23)
Re: Using C as a back end predictor@my-deja.com (Pred.) (2000-10-23)
Re: Using C as a back end predictor@my-deja.com (Pred.) (2000-10-23)
Re: Using C as a back end sweeks@my-deja.com (Stephen T. Weeks) (2000-10-23)
Re: Using C as a back end jaidi@fos.ubd.edu.bn (Pg Nor Jaidi Pg Tuah) (2000-10-26)
Re: Using C as a back end predictor@my-deja.com (Pred.) (2000-10-26)
Re: Using C as a back end ONeillCJ@logica.com (Conor O'Neill) (2000-10-26)
Re: Using C as a back end kst@cts.com (Keith Thompson) (2000-10-26)
Re: Using C as a back end jacob@jacob.remcomp.fr (jacob navia) (2000-10-26)
Re: Using C as a back end nr@labrador.eecs.harvard.edu (2000-10-31)
[17 later articles]
| List of all articles for this month |

From: "Stephen T. Weeks" <sweeks@my-deja.com>
Newsgroups: comp.compilers
Date: 23 Oct 2000 22:16:33 -0400
Organization: Deja.com - Before you buy.
References: 00-10-148
Keywords: C, ML, translator

Pred. <predictor@my-deja.com> wrote:


> I have designed a language for which I'm hoping to create
> a compiler. Since I want a portable solution I was thinking
> about using a
> retargable C or C++ compiler in the back end along with
> appropriate assembler / linkers. Is this a good solution?


Since August of 1997, I have been part of a group developing a
Standard ML (SML) compiler called MLton
(http://www.sourcelight.com/MLton) that uses C as its backend. It has
been publicly available since March 1999. We initially chose C (gcc)
as a backend for much the same reasons as you. We have been fairly
happy the results, although we are now moving to a native backend to
improve compile times and run times. FYI, SML is fairly far from C:
it has first-class functions, exceptions, and garbage collection.


Here are some of our experiences.


PROS:
                * Easy to do a C FFI. This has been valuable for implementing
libraries.
                * Easy to debug the code generator. We do this by having the
compiler generate everything as C macros. We can then
hand edit the macros or the generated C to print
debugging info.


CONS:
                * C compiler bugs.
                * Slow compile times.
                * C optimizer doesn't do as well as we would like.


As to the C compiler bugs, we didn't find many, but the ones we did
find cost us a lot of time and frustration.


Peter Gammie <peteg@cse.unsw.edu.au> wrote:


> The biggest problem with compiling to C is its lack of
> expressiveness; almost certainly you'll want to use gcc's
> extensions (such as goto), unless your language is already close
> to C.


This was our thought as well, especially for a language with first.
However it led to one of those very annoying bugs. Initially, we
tried using first class labels, but discovered that it was confusing
gcc's control-flow analysis and causing its register allocator to
screw up. We abandoned using them fairly early on and stuck with
vanilla C.


Jim Granville <jim.granville@designtools.co.nz> wrote:


> 1) With modern PC's compile time is a complete non-issue,


Compile time was and is an issue with us. We feed C programs with
hundreds of thousands of lines to gcc, which can take tens of minutes
to compile. We have had to go to some effort to break up our C
programs into manageable size C procedures (thousands of lines) so
that the C compiler will compile them quickly enough. At some point I
remember feeding a single C procedure with roughly half a million
lines to gcc and letting it run for a couple of days before giving up.
In breaking up our C program into C procedures we have also had to be
careful to preserve tail recursion by using a trampoline. To avoid
trampolining too much, which would hurt performance, we try to keep
code that calls other code in the same C procedure.


We also found that there are apparent nonlinearities in the run time
of the assembler, which we are suffering with to this day.


As to run time performance, it depends on how good you need it to be.
I assume the reason you're compiling at all instead of interpreting or
byte-code compiling is that you need decent performance. If so, you
have to be prepared to play around with generating different kinds of
C code and seeing what assembly the C compiler generates to make sure
it's good enough. Unfortunately this offsets some of the apparent
portability advantages of generating C, since in the end your gonna
have to look at the real generated code anyways if you want really
good performance.


Norman Ramsey <nr@labrador.eecs.harvard.edu> wrote:


> It's a pretty good solution if
> * your language doesn't have garbage collection, or you're willing
> to use the marvellous Boehm/Weiser/Demers conservative collector
> * your language doesn't have exceptions, or they can be expensive
> * you don't care much about source-level debugging
> Otherwise, you're more or less out of luck.
> (But check out www.cminusminus.org for an alternative
> that might be available sometime next year.)


I disagree with the first two points above. Since SML has garbage
collection, we had to face the problem of implementing it. Initially,
we used the Boehm/Weiser/Demers conservative collector. However,
after a month or two of use and some profiling, we found its
performance to be unacceptably slow. Also, for a language like SML
with a strict notion of space safety, the thought of space leaks
created by the runtime system, out of the control of the programmer,
is unacceptable. So, very early on in the project I wrote a simple
two-space copying gc that we have used ever since. It has excellent
performance as long as there isn't much live data, and has even been
acceptable for programs with large amount of live data like
self-compiles, although a generational gc would be nice. We went
through several iterations of the interface between the generated C
and the runtime (all basically different ways of telling the GC the
root set) and eventually found one we liked.


As to exceptions, SML has them, and I don't think our implementation
is slow. Raising an exception to an unknown handler involves a store,
an add, a dereference, a switch and possibly a trampoline, if the
destination is in another C procedure. Raising an exception to a
known handler is just a goto.


As to C--, I remember considering it a couple of years ago because it
was "almost ready". I think most of the language features
(exceptions, gc, first-class functions) that cause difficulty in
compiling to C can be worked around and that you are safer sticking
with an existing C compiler if you need to get up and running soon.


In summary, whether you should target C depends on how different your
language is from C, how much performance you need, and how much
manpower you have. There are a lot of drawbacks to using C and you
will spend time fighting compiler bugs, performance of the compiler,
performance of the generated code. But, you will get stuff up and
running faster, which for us made work a lot more fun. Our C backend
has been around for a couple of years now, and we have learned a lot
from it. Even after spending a lot of time generating code to please
the C compiler, we are moving to a native backend to improve both
compile times and run times. Our preliminary efforts in this
direction indicate that there is a lot to be gained.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.