Re: Languages with well-integrated Foreign Function Interface to learn from?

"BGB" <cr88192@hotmail.com>
Thu, 3 Sep 2009 08:41:36 -0700

          From comp.compilers

Related articles
[12 earlier articles]
Re: Languages with well-integrated Foreign Function Interface to learn cr88192@hotmail.com (BGB / cr88192) (2009-08-31)
Re: Languages with well-integrated Foreign Function Interface to learn DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-09-02)
Re: Languages with well-integrated Foreign Function Interface to learn paul.biggar@gmail.com (Paul Biggar) (2009-09-02)
Re: Languages with well-integrated Foreign Function Interface to learn cr88192@hotmail.com (BGB) (2009-09-02)
Re: Languages with well-integrated Foreign Function Interface to learn paul.biggar@gmail.com (Paul Biggar) (2009-09-03)
Re: Languages with well-integrated Foreign Function Interface to learn cr88192@hotmail.com (BGB / cr88192) (2009-09-03)
Re: Languages with well-integrated Foreign Function Interface to learn cr88192@hotmail.com (BGB) (2009-09-03)
| List of all articles for this month |

From: "BGB" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Thu, 3 Sep 2009 08:41:36 -0700
Organization: Compilers Central
References: 09-07-074 09-07-095 09-07-105 09-08-050 09-08-056 09-09-005 <af54caaf0909020734p185e03a6hb95570b55dcbed9a@mail.gmail.com> <BLU0-SMTP75F058507C9C4E91D023BDE4F00@phx.gbl> <af54caaf0909030048t6efd616dme38a4ffee587dfba@mail.gmail.com>
Keywords: code
Posted-Date: 03 Sep 2009 13:12:32 EDT

>> ----- Original Message ----- From: "Paul Biggar" <paul.biggar@gmail.com>
>> one major advantage:
>> you can run C in the VM.
>>
>> C is a very useful language to be able to run as a scripting language,
>> since
>> it is a fairly powerful language.
>
> Nothing against C, JITs, etc. I was just suggesting good practice for
> designing portable FFIs. I think if you follow the design you've
> outlined, it will be nearly impossible for a reimplementation to use
> your FFI without copying your VM design.
>


in the abstract sense, the only fundamental requirement is that the VM be
able to do the same things (run C and provide a transparent FFI).


any VM design will work, so long as it has the same capabilities...




but, then again, at present I know of no other VMs which can do exactly the
same things.


LLVM and .NET would come closest, LLVM differing mostly in architectural
issues, and .NET lacking an entirely transparent FFI ("P/Invoke" does not
count, since one does need, firstly, to manually utilize P/Invoke...).




for example, how they go about compiling and running said C code is up to
them, as I don't mandate any "one true bytecode to rule them all", ... (and,
at this moment, there is not even an established cannonical bytecode, since
my particular architecture did not base itself on one, as such...).




>>>> I can already more-or-less glue dynamic typing to C-style data
>>>> representations, ...
>>>
>>> Maybe, but why would you want to? In my opinion, all you do here is
>>> prevent people from reimplementing your language, which is the same
>>> mistake that all existing scripting languages have made.
>>>
>>
>> the main advantage is that it allows many places to internally use static
>> typing, which allows for greater performance in many cases...
>>
>> another advantage is that it can save memory...
>
>
> I don't think there is anything in the FFI proposal I laid out that
> prevents implementing as much of your run-time or library as you
> choose in C. You don't need to support C code in your VM to achieve
> this.
>


but, what of the code running IN the VM?...


if the scripting language goes ahead and uses an object representation based
on pointer-sized references and lots of internal pieces, then all of this
memory is used up, when the VM could have saved a good number of bytes by
packing the entire object into a few fields.




granted, I do know of (have, actually), a prototype-based object system that
gets the cost down (mostly) to the headers, +2 arrays (one holding field
info, the other holding the value).


with static typing, we don't need the field info, and may be able to use a
smaller value for the field value (although, then one gets into another
mess: lots of micro-classes and the use/abuse of interfaces).




if the script itself does almost nothing besides call runtime functions,
maybe it will not cost much.


but, presumably, scripts will do real work with real data, and so the
internal use of non-static structures will eat up memory and clock cycles...


>
> Again, if you determine that these can be statically typed, you do not
> need to support C in your VM to optimize this.
>


yes, but I support C because it is a very useful language (and also my main
programming language).
granted, C is not the most ideally suited to a VM context as I have found,
but oh well...


supporting C is not so much about micro-optimizing (this can be done either
way), but more, C is a mechanism by which the FFI can be made almost
entirely transparent...


script lang <-> dynamic C <-> static C


at this point, the only real contrast is that, some C is compiled into the
host app, and that some C is loaded into the VM.




the VM design is such that, upon loading the C, the VM can gains near
omniscience over the internal workings of said loaded C, allowing a good
deal more auto-integration power.


the main border then, is the inherent language differences across the
border, since not all things in one language will exactly map to the other.




<snip>


>> who ever said anything about statically compiling the JS?...
>>
>> I simply said it would be compiled to statically-typed native code.
>
> Indeed. I misread your previous mail.
>


ok, granted.




>> this means:
>> A, it is compiled to dynamically typed bytecode;
>> B, the JIT does all the trickery to lower it to static typing.
>
> You should of course support a low-level IR in your JIT, for all the
> reasons you outlined above. At the risk of repeating myself, I don't
> believe you've given a good reason for supporting the full generality
> of C (except that its a cool hack, perhaps :))


well, mostly it is that I code mostly in C, and most of my static-land code
is C (apart from some C++ here and there).


at the time, I had figured, "I know what would be teh awesome: if I could
just use C as the scriptingt language and totally avoid all of the FFI and
runtime-support issues...".


but, things never go "this" easy it seems.


there was a subtle detail I had not noticed, but became very obvious later
on:
when statically compiling C, we can note the delay... as the compiler builds
one file, and moves on to the next, ...


I then discovered, this delay can be rather annoying WRT load times, as then
it ends up taking a good number of aditional seconds for my app to be up and
running...


nevermind the level of effort one ends up investing in the project.




as for a low-level IR:
I have "RPNIL", but this is more of a high-level IL.


basically, it operates at a level of abstraction between that of MSIL/CIL
and PostScript (and, ammusingly, I had noted at one point that my optimizer
was turing complete, and so with a few trivial adjustments, I could use the
optimizer itself as an interpreter, although I also noted that this is not
terribly useful...).


I have on-off considered adding a secondary, much lower-level IR, but the
main cost is that I have not managed to successfully partition said
high-level IL compiler, from the low-level codegen machinery.


I am gradually moving in this direction though, and the exact ties between
the high-level and low-level are weaker than in the past...




the codegen itself is not strictly TAC or SSA, it is actually more of a big
mass of code and general mechanisms, rather than something designed to any
particular formal model...




now, this RPNIL language, works plenty well for C, but does not want to
adapt to a whole lot all that much different.


I have started considering that some other languages (notably, JS), would
likely have their own IL, and possibly a branched version of the codegen
(unless... I can manage to break up the RPNIL / codegen dependency before
this point, making the codegen more open to alternative IL frontends...).


if I were to do so, the design for the new IL frontend would likely have
some basis in the AVM2 bytecode.




>> the main complexity with JIT and JS is part of the typesystem:
>> there is no clear distinction between integer and real types;
>> nearly all overflows are defined of automatically going to double;
>
> There isnt a huge distinction between strings and numbers either,
> AIUI. You might have come across Gal's PLDI 2009 paper
> (people.mozilla.org/~dmandelin/tracemonkey-pldi-09.pdf) with their
> nice solution. V8 has a nice 100% compiled solution too
> (http://code.google.com/apis/v8/design.html).
>


yes, I may have to look into this.


ActionScript may be slightly easier, since apparently they make a little
more distinction than JS proper.


sadly, I am not too much of an expert between all the finer points of
ECMAScript vs JavaScript vs ActionScript, nor with their respective runtime
libraries, but oh well...



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.