Re: Managing the JIT

"BGB / cr88192" <cr88192@hotmail.com>
Mon, 27 Jul 2009 23:30:14 -0700

From comp.compilers

Related articles
Managing the JIT herron.philip@googlemail.com (Philip Herron) (2009-07-22)
Re: Managing the JIT echristo@gmail.com (Eric Christopher) (2009-07-24)
Re: Managing the JIT herron.philip@googlemail.com (Philip Herron) (2009-07-25)
Re: Managing the JIT armelasselin@hotmail.com (Armel) (2009-07-25)
Re: Managing the JIT herron.philip@googlemail.com (Philip Herron) (2009-07-27)
*Re: Managing the JIT cr88192@hotmail.com (BGB / cr88192)* (2009-07-27)**
Re: Managing the JIT cr88192@hotmail.com (BGB / cr88192) (2009-07-28)
Re: Managing the JIT armelasselin@hotmail.com (Armel) (2009-07-29)
Re: Managing the JIT cr88192@hotmail.com (BGB / cr88192) (2009-07-30)
Re: Managing the JIT armelasselin@hotmail.com (Armel) (2009-07-31)
Re: Managing the JIT barry.j.kelly@gmail.com (Barry Kelly) (2009-08-01)
Re: Managing the JIT cr88192@hotmail.com (BGB / cr88192) (2009-08-02)
[3 later articles]

| List of all articles for this month |

From:	"BGB / cr88192" <cr88192@hotmail.com>
Newsgroups:	comp.compilers
Date:	Mon, 27 Jul 2009 23:30:14 -0700
Organization:	albasani.net
References:	09-07-079 09-07-086 09-07-087
Keywords:	code, LLVM
Posted-Date:	29 Jul 2009 08:38:22 EDT

"Philip Herron" <herron.philip@googlemail.com> wrote in message
> Eric Christopher wrote:
>> John is absolutely correct here, what you'll want is an llvm
>> ExecutionEngine
>> to manage your jitted code via the modules that you put intermediate code
>> into. There's a simple example for a toy language up on the llvm
>> website:
>>
>> http://llvm.org/docs/tutorial/index.html
>>
>> I'd read the Kaleidoscope tutorial.
>
> Thanks loads i've been doing a lot of reading on those tutorials there
> very nice and easy to follow for me! The thing that still bothers me
> is i would love to know more how the 'Execution-Engine' works but I'm
> starting to see how you could do it.

I will comment here that personally, I am not a particular fan of LLVM's
architecture...
however, luckily, members of the LLVM community are working hard to address
"my" problems (actually... these issues are probably more being addressed to
improve the general usefulness of the framework more than anything else...).

the simple answer here:
getting from a general representation (say, RPN or 3-Address or SSA form) to
something like ASM or machine code, seems easy enough, and initially it is
(well, in a matter of scale...).

the issue though, is that the problem seems to inflate sort of like a
balloon (or an air mattress, for those who have inflated them without a
pump...). basically, a seemingly endless stream of effort goes into the
thing, and "results" are few and far between...

> Is it more each time a new function called it is jit'd then you would
> keep this is memory so there is no output, it must be pretty
> complicated in managing all this new code this, and then how do you
> decide what to jit, as in do you just jit function by function, or a
> set of calls at a time. Thats more the languages problem i guess. Then
> if your keeping all the code in memory, how do you exeucute each bit
> of jitted it code is each piece of a code into a new stack frame and
> executed like a thread or is it less complicated? Maybe you can
> directly load the raw binary or something.
>

there are many different ways...

in my framework, I implemented, first of all, an assembler and a dynamic
linker.

the task of the assembler:
take ASM, assemble it, and then produce an "object" (for example, a buffer
containing the assembled code in a custom, or standardized, format...).

in the case of my assembler and linker, I am currently using COFF between
the stages (initially, I passed the data as structs, but this is less
"modular"/"generic", however, COFF is a lot more complicated than passing a
struct...).

it is then possible for the linker to maintain an "image", which is
basically analogous to a heap, but containing executable code (for '.text'),
and a space for data and bss sections. when linking code into, it allocates
space of the appropriate types for the various object sections, and then
resolves symbols and patches up relocations as appropriate.

so, beyond the code/data heap, there is also a big symbol table, which
basically tells what all symbols are part of the "image", and where in
memory they currently reside. similarly, this table also links into another
table of global relocations, which basically holds all of the fixups which
currently depend on this symbols (such that any code which references the
given symbol can be patched should the symbol move).

or, at least, this is the "simple" explanation (there is acually a good
amount of trickery involved in all of this).

granted, if one is willing to use ELF (and ELF-style linking) instead of
COFF and Windows-style linking, the process would be much simpler.
basically, since everything would be explicitly resolved against a GOT
(Global Offset Table), meaning one only patches up the table entries, rather
than everything which uses a given symbol.

for various reasons, I will currently recommend this style over the
Windows/COFF style direct-linking approach. (speaking from experience
here...). or, at least, in the case of x86-64...

actually, I use an "alternate" strategy for "GC-safe" thunks, which
basically amounts to indirectly linking against a sort of mini-GOT. the
reason is that, using my prior strategy, garbage collecting code and data
is, problematic. note, however, that my approach to linking these thunks is
"unconventional" (note: this also effects the ASM code).

in particular, linking is performed against "scopes", which may define
symbols locally, rather than directly against the global toplevel, and by
default, each assembled thunk gets its own local "scope" as well (the scopes
may be chained together, and the global toplevel is a fallback).

note that this mechanism also allows the convinient implementation of
"lexical closures", among other things, since each thunk can have variables
which belong "only to itself".

so, what is compiled?...
whatever we decide to produce in the form of an ASM fragment.

now, what do I usually compile?

personally, I usually compile code a "module" at a time.
this module usually corresponds more-or-less 1:1 with whatever was in the
input source file.

it would be possible to compile a function at a time, but personally I have
seen little need for this (apart from if one expects a lot of function-level
patching, and mostly as a means to reduce uncollectable modules, or as a
means to reduce lazy-linking pressure).

so, the frontend requests that a source file be loaded, and the framework
may go through all the effort of JIT'ing it, or just add it to a queue
somewhere.

another strategy would be to semi-compile code (for example, compiling it
into some form of bytecode). at which point, the linker attempting to
resolve undefined symbols could cause a module to be JIT compiled, and sent
to the assembler and linker.

the cost here is the risk that, attempting to resolve a symbol could lead to
a "JIT cascade", which could lead to a notable delay (for example, in an
interactive / soft-real-time app), as the JIT compiler JITs one thing, only
to create more unresolved symbols for it to JIT code to resolve, ...

one way I have to lessen this issue (among many others related issues), is
that I have made my linker often, by default, not directly link modules into
the image. instead, it will typically add modules to a queue. when an
unresolved symbol is encountered, typically it will look for a module in the
queue exporting the requested symbols, and then proceed to link this into
the image (this may cascade and cause a lot of linking, but linking is
fairly cheap...).

however, in general, this helps with many issues, since delaying linking
will also delay the need to resolve symbols, often giving them much more
chance to migrate into the symbol table as well. it also greatly helps with
the case of cyclic dependency loops (basically, where several modules exist
which depend on symbols exported by each of the other modules), which were
previously causing trouble for my linker with a linear-linking stategy.

consider modules A, B, and C, where A depends on B, which depends on C,
which depends on A.

linking them linearly yeilds:
A: linked, needs (not yet visible) declarations from B (falling back to
far-reaching, and potentially inappropriate/undesirable, symbol resolution
strategies);
B: linked, needs C's decls, meanwhile A->B linkage may have messed up;
C: likewise...

however, queuing them makes all of the exports visible to the linker, thus:
A is linked, grabs B;
B is linked, grabs C;
C is linked.

and, there is no need for ugly fallback strategies, since the linker will
already know that the symbols are visible, and will not go bothering other
parts of the framework or trying to guess whether or not it needs to patch
something together with a thunk / ...

(note that the queued modules are pre-decoded, rather than left as COFF
buffers...).

actually, dealing with cyclic dependencies was actually the main reason I
went and added this the queue / lazy-link feature to my linker (previously,
stuff was going on that should not have been happening, ...).

the JIT cascade issue could be lessened to some extent, by using a sort of
speculative JIT (basically, where the compiler framework makes guesses about
what is likely to be called, and proceeds to JIT these and send them off to
the assembler/linker).

however, my framework does not currently use this strategy (it does not
speculate about when to JIT, it just does what it is told, namely, it tries
to compile everything it is given up-front...).

> I guess i have a lot more to read :) But thanks loads for you help so
> far, its a subject thats been bugging me for a while.
>
> --Phil
> http://redbrain.co.uk
> [You keep track of the compiled code in a symbol table. I can imagine a
> variety
> of strategies to decide when to compile, e.g., a function at a time, a
> function and
> the ones it calls down to some number of levels, etc. -John]
>

yep, yep...

my strategy is eager, but this is in part because my code is compiled in
such a way that there is presently no "safety net" in place for
not-yet-linked symbols (running against such a symbols is then likely to
cause the code to segfault, as it tries to jump through a NULL pointer or
similar...).

however, as is, it may well be the act of trying to request the function
pointer, which causes it and all of its dependencies to be linked...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Managing the JIT

"BGB / cr88192" <cr88192@hotmail.com>Mon, 27 Jul 2009 23:30:14 -0700

"BGB / cr88192" <cr88192@hotmail.com>
Mon, 27 Jul 2009 23:30:14 -0700