Re: Jit Implementation

"BGB / cr88192" <cr88192@hotmail.com>
Tue, 23 Mar 2010 22:31:24 -0700

          From comp.compilers

Related articles
[8 earlier articles]
Re: Jit Implementation cr88192@hotmail.com (BGB / cr88192) (2010-03-21)
Re: Jit Implementation herron.philip@googlemail.com (Philip Herron) (2010-03-21)
Re: Jit Implementation barry.j.kelly@gmail.com (Barry Kelly) (2010-03-22)
Re: Jit Implementation bartc@freeuk.com (bartc) (2010-03-23)
Re: Jit Implementation bartc@freeuk.com (bartc) (2010-03-23)
Re: Jit Implementation cr88192@hotmail.com (cr88192) (2010-03-23)
Re: Jit Implementation cr88192@hotmail.com (BGB / cr88192) (2010-03-23)
Re: Jit Implementation bartc@freeuk.com (bartc) (2010-03-24)
Re: Jit Implementation cr88192@hotmail.com (BGB / cr88192) (2010-03-26)
Re: Jit Implementation bartc@freeuk.com (bartc) (2010-03-28)
Re: Jit Implementation cr88192@hotmail.com (BGB / cr88192) (2010-03-28)
| List of all articles for this month |

From: "BGB / cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Tue, 23 Mar 2010 22:31:24 -0700
Organization: albasani.net
References: 10-03-070 10-03-078
Keywords: code
Posted-Date: 24 Mar 2010 09:53:36 EDT

"bartc" <bartc@freeuk.com> wrote in message
> "BGB / cr88192" <cr88192@hotmail.com> wrote in message
>> "bartc" <bartc@freeuk.com> wrote in message
>
>>> I have a project which works roughly this way:
>
>>> => P Pseudo-code (my IL)
>>> => M Target-code (representation of x86 instructions)
>>> => x86 Actual binary code
>
>> yes, ok.
>> but, it is confusing here as "M" is said to represent x86 instructions,
>> whereas normally one would send out code at the IL level?...


> Yes, I was originally writing 'P' code to disk, but then it was
> interpreted. But I needed something faster, and JITing from the
> then P-code was too much of a leap.


yes.
can't comment much on this point not knowing the specifics of your
implementation.




> I didn't want to go the route of assembler output, object code,
> linking and executable files either. I wanted to maintain the
> dynamicism of an interpreted language.


granted.


I don't give up this dynamicism, FWIW.
although, the framework is not as dynamic as it could be, this is more a
result of the C compiler being slow, rather than really anything to do with
the assembler proper.




> I'm also using a dynamic language now to implement both the compiler
> and loader parts (before the loader (or interpreter then) was in a
> hard compiled language), so the whole thing is flexible to some
> extent.


yeah, my effort is primarily C.
no non-C languages have gained much hold at present.




>> I have a dynamic assembler and linker, which function about like
>> that of a normal (stand-alone) assembler and linker, but which
>> operate entirely in memory. > ...


>> generally, I find performance to be acceptably good, even in most
>> cases when generating code with time-constraints or in loops, and
>> even with all the extra crap I ended up adding on eventually.


> I wanted to avoid assemblers, and all that goes with them,
> completely. Then I remembered my language allows inline assembler,
> and to translate it to my 'M' format, I would have to write an
> assembler of sorts..


I originally wrote my assembler specifically for the purpose of doing JIT.
in its original form (back in late 2006), it was far more minimalistic.
most features were added later to address specific limitations which came
up.


for example:
I added syntax as using an API to emit opcodes was lame;
I added basic linking as simply emitting code and data to sequential memory
addresses was limiting;
...


this allowed some of my first JIT experiments:
in late 2006 / early 2007, I had success JIT'ing a script language of mine,
and at the time was pulling off performance loosely comprable to that of
GCC-compiled C code.




this prompted me originally to start on a C compiler, which started out as
essentially hacking over my script-language compiler and bytecode JIT.


the addition of a textual IL was mostly because, for testing of the codegen,
I had written a simple parser to test some features. I had originally
intended the compiler frontend to use raw bytecode.


however, these parts were partly developed separately, and in the process
managed to become incompatible.
at this point, I just sort of forced the IL parser into use and made the
compiler frontend target this instead.




> But in the dynamic language I was using, this added less than 300 lines to
> the compiler, plus various tables. (Because the assembler source exists in
> the framework of a high-level language, I could cut a few corners.)


in my case, the assembler + linker is not a particularly large component,
especially if one shaves off a lot of extra stuff that is not needed for its
basic operation.


granted, it is still a bit larger than 300 loc though.




>> there is a slight complexity that can result from dynamic COFF linking,
>> which is dealing with matters of dependency order/resolution, but I had
>> dealt with this via a hack:
>
> A 'slight' complexity...? I've always tried to design around formal
> link/loaders where possible.
>


I suspect similar issues pop up with traditional linkers as well, and these
issues are addressed by these linkers presumably in a similar manner to how
I had done so.




this issue was mostly popping up when reading in static libraries, or when
compiling multiple interdependent modules in no particular order.




>> the question though is, once again, why to emit a representation at
>> this level: if it is at the same level as ASM, then it is not
>> portable between systems; but, at the same time, one has to
>> translate it to make it runnable.
>
> For distribution purposes, the choice was to use Source code, P code, or M
> code. I've chosen the latter for the present.


ok.
I have an IL, but I generally don't use it in any "official" measure. I
instead allow source, ASM, and object-code, as the main external
representations.




>> with a traditional IL, it can be compiled to use different CPU
>> architectures; with raw machine code, it can be run on the
>> processor directly. this is, unless I have misunderstood what is
>> being done here. > > Nothing much except this is an experimental
>> project and I don't care about > other processors at the minute...


ok, granted, my project is not currently portable outside of x86 and x86-64
land.
adding x86-64 was a fairly long ordeal, but mostly in the C main codegen.




>> granted, an IL can be very low-level, essentially representing a
>> glossed-over version of the target architectures, or simply a
>> virtual processor (where the IL opcodes are fairly directly mapped
>> to their native anologues in most cases).


> My original 'P' IL mapped to byte-code and needed to be interpreted.


> That imposed many restrictons, but now that I don't have the
> headache of dealing with it efficiently at runtime, it's now fairly
> high-level: it has a stack, registers and operands in any
> combination, but is still just a linear sequence of instructions.


yeah.
many of mine are variants of RPN and abstract stack machines...




> It's a little higher level than x86, my main target, but not too much:
>
> * 3 fundamental types agains x86's one main 32-bit type
> * 2-address instructions against x86's one-address form
>
> However, it has fewer registers: just one main register, plus one or two
> auxilliary ones (I've never figured out how to compile for multiple
> registers).


more than a few registers means using a register allocator.


in the trivial case, one is not needed (my early codegens didn't use on, but
instead assigned registers to a complicated set of rules and roles).


later on, I added a register allocator.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.