Re: Jit Implementation

"BGB / cr88192" <cr88192@hotmail.com>
Sun, 21 Mar 2010 13:57:40 -0700

From comp.compilers

Related articles
[2 earlier articles]
Re: Jit Implementation bartc@freeuk.com (bartc) (2010-03-20)
Re: Jit Implementation jgd@cix.compulink.co.uk (2010-03-20)
Re: Jit Implementation anton@mips.complang.tuwien.ac.at (2010-03-21)
Re: Jit Implementation gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-03-21)
Re: Jit Implementation herron.philip@googlemail.com (Philip Herron) (2010-03-21)
Re: Jit Implementation jthorn@astro.indiana-zebra.edu (Jonathan Thornburg \[remove -animal to reply\]) (2010-03-21)
*Re: Jit Implementation cr88192@hotmail.com (BGB / cr88192)* (2010-03-21)**
Re: Jit Implementation herron.philip@googlemail.com (Philip Herron) (2010-03-21)
Re: Jit Implementation barry.j.kelly@gmail.com (Barry Kelly) (2010-03-22)
Re: Jit Implementation bartc@freeuk.com (bartc) (2010-03-23)
Re: Jit Implementation bartc@freeuk.com (bartc) (2010-03-23)
Re: Jit Implementation cr88192@hotmail.com (cr88192) (2010-03-23)
Re: Jit Implementation cr88192@hotmail.com (BGB / cr88192) (2010-03-23)
[4 later articles]

| List of all articles for this month |

From:	"BGB / cr88192" <cr88192@hotmail.com>
Newsgroups:	comp.compilers
Date:	Sun, 21 Mar 2010 13:57:40 -0700
Organization:	albasani.net
References:	10-03-054 10-03-060
Keywords:	code, tools
Posted-Date:	22 Mar 2010 21:03:34 EDT

"bartc" <bartc@freeuk.com> wrote in message news:10-03-060@comp.compilers...
> "Philip Herron" <herron.philip@googlemail.com> wrote in message
>
>> So ok if we have successfully output target code for the processor
>> would you use simply a system assembler or what i see some people
>> have runtime assemblers to assemble this code, then where would this
>> target code go? Then how do you link it in a useful way? And
>> ultimately how would you execute this but that depends on how you
>> link it or where you link it. What might work but i doubt seriously
>> if it works like this, to generate a dynamic library .so and use
>> dlsym to execute them from your runtime. That is if you were to
>> have a nice dynamic jit rather than a fully AOT jit which generates
>> code fully so you really compile your language.
>
> I have a project which works roughly this way. Perhaps comparing this to
> other systems might be useful:
>
> It's in two parts, a compiler:
>
> S Source code
> => U AST
> => P Pseudo-code (my IL)
> => M Target-code (representation of x86 instructions)
> => M-file Distribution form
>
> And a, er, Loader (I haven't found a good name for this part yet):
>
> M-file Distribution form
> => M Target code repr
> => x86 Actual binary code
>
> Then the code can be directly run in memory, from the loader.
>
> Libraries in the same language are also distributed in this form, and
> dealt with by the Loader.

yes, ok.
but, it is confusing here as "M" is said to represent x86 instructions,
whereas normally one would send out code at the IL level?...

> In this project, I stay well clear of things such actual assemblers,
> link-loaders, dynamic libraries and executable files. They just make
> things slow, cumbersome and generally less dynamic than I would like.
> (To access OS and other external services, it does use external DLLs.)

I have a dynamic assembler and linker, which function about like that of a
normal (stand-alone) assembler and linker, but which operate entirely in
memory.

IME, this strategy works good in terms of being dynamic (typically, I just
use a begin/end pair and "print" to the assembler, and then right off I have
new runnable machine code).

early on, the assembler would actually translate instructions in-place (each
"print" instruction would directly drive the assembler logic), but in time I
ended up changing it in a few ways:
the printing went into a buffer, which was assembled upon reaching "end";
I added, at first, a separate link step, rather than using in-place code
generation, but the assembler-context was used also to represent the object
module;
I later added a preprocessor, which was mostly a hacked-on version of my C
preprocessor (because some macro features are useful...);
I split up the front-end assembler and back-end compiler logic and
structures (and now temporary COFF objects are used between the assemble and
link stage), but still both reside in the same library.

initially, I had used direct function calls to emit instructions, but
quickly abandoned this as I found it to be horridly cumbersome (as I
switched to using an NASM-style syntax and printf-like calls instead).

note that it was initially, and is still the case, that there is no full
parser in use here, rather tokenization directly drives the assembler logic
(optimizing jump-targets/... involves re-running the assembler multiple
times on the preprocessed ASM).

generally, I find performance to be acceptably good, even in most cases when
generating code with time-constraints or in loops, and even with all the
extra crap I ended up adding on eventually.

now, as far as doing a full "proper" link and producing and loading PE/COFF
images, yes, this would likely add a good deal of overhead (this mostly due
to a lot of internal complexities involved with producing PE/COFF images,
however, directly linking COFF modules into memory can be done much more
cheaply).

there is a slight complexity that can result from dynamic COFF linking,
which is dealing with matters of dependency order/resolution, but I had
dealt with this via a hack:
modules now are now typically delay-linked, where a module is added to a
list of "modules to be linked" (it is partly processed, and its exported
symbols are visible, but it is neither physically linked into the image nor
does it attempt to resolve symbols);
when a symbol is needed from such a module, it is (physically) linked into
the image, and may in-turn trigger more delay-linked modules to be linked
(linking is done in 2 steps, where first it is mapped physically into memory
and symbols are exported, and then secondarily it tries to resolve any
dependencies, where this process may cascade).

originally, I used only eager linking, which could result in unneeded
linkage issues (it would try to link in one module, then be faced with the
problem of symbols which were not-yet visible, then do something stupid in
an attempt to resolve them, be left with a mess once the module exporting
the symbols is finally linked in).

so, in the delay-linking scenario, the frontend may load several static libs
and then dynamically load and compile several C modules, but the linking
doesn't take place until the frontend tries to grab function pointers into
the loaded modules.

as a minor downside, at present my dynamic linker doesn't currently work
with MSVC-produced COFF objects, I suspect mostly because they tend to
contain lots of RVA-relative relocations and similar, which my dynamic
linker doesn't really deal with (there is no sensible RVA as there is no
PE/COFF image, but MSVC's code tends to assume such an image existing).

however, exported objects from my stuff tend to work with MSVC.

luckily, there is an answer:
when using MSVC to compile code, compile it all the way into a DLL, since my
stuff can also make use of DLL's...

> The 'M' target code corresponds to assembly code, but uses an internal
> form which is quicker than writing out .asm files that then require a
> custom runtime assembler (a standard assembler produces object files which
> are no good).
>
> (However, I plan to have also conventional .asm output, so that I
> can create the occasional executable, such as those two programs
> above.)
>

ok.

the question though is, once again, why to emit a representation at this
level:
if it is at the same level as ASM, then it is not portable between systems;
but, at the same time, one has to translate it to make it runnable.

with a traditional IL, it can be compiled to use different CPU
architectures;
with raw machine code, it can be run on the processor directly.

this is, unless I have misunderstood what is being done here.

granted, an IL can be very low-level, essentially representing a
glossed-over version of the target architectures, or simply a virtual
processor (where the IL opcodes are fairly directly mapped to their native
anologues in most cases).

I had considered something like this before, but couldn't come up with a
good representation which could both adequately gloss over the instruction
set as well as avoid the complexities and limitations of a more traditional
IL.

<snip>

> I find a lot of other people's code, especially of large, complex working
> projects, to be unreadable.

I agree, and that was originally partly why I started my project.
however, I have quickly found that code of this sort tends to rapidly
inflate and become a bit of a confusing mess.

it is like, a problem will come around, and one will try to address it, but
each time they address a problem, the new code is bigger and uglier than the
old code.

for example, one has a string intern function, and it is around 60 lines of
code (it being of the simple: use a hash-table to check for string,
otherwise scan for it in the string table, and if not found append to the
end and add to the hash).

but, then it bogs down horribly when dealing with a large number of new
strings, since it ends up doing a linear-time lookup every time the hash hit
fails.

then, one tweaks it some, so that it uses hash-chaining instead (appending
to the end if the string doesn't exist in the hash chains), and suddenly
the thing is 94 lines.

or, one implements a small database (in my case, a heirarchical DB of a
similar sort to the Windows' Registry), and then one notes that at some
points (such as at app startup), this DB is using up a large amount of the
total running time.

now, the problem:
it having a decent amount of contents and using linear-searches to lookup
nodes (it will do a linear search at each level of the tree, recursing and
looking up the next part of the key-path, ...).

but, then one finds that, in this case, simple hashing wouldn't work well.
so, it was reworked some to use AVL trees instead of simple linked-lists,
...

and, in much the same way, the code complexity has increased...

after a while of this, it seems like the large complex codebases of existing
compiler and VM projects is inescapable. although, one can at least try to
have clean coding practices, such as to hopefully avoid the (IMHO) pure
concentrated nastiness found in projects such as Mono...

>> I know i am missing something or not seeing something but its bugging
>> me so i hope someone can fill in the gaps for me. Its just annoying
>> I've been toying with some small programs to generate code at runtime
>> on a basic language, but all i do is output to a file descriptor i
>> specify; simply to see if the code looks good so far, it would be nice
>> what do you do next.
>
> I started off with experiments such as the following (but note that as you
> appear to be on something like Linux, malloc() memory may possibly not be
> executable, as the moderator notes), for the x86:

<snip, code fragment>

I have done this sort of thing a few times, but generally prefer to avoid
doing so as it doesn't scale well and is even less portable than just using
textual ASM (since textual ASM is usually easily edited or replaced, mixes
better with macros, ...).

>> [... on recent x86
>> where data pages are usually marked no-execute. -John]
>
> Fortunately that doesn't seem the case with XP and Vista at least.

these don't normally use DEP by default for 32-bit apps...

note that mmap or VirtualAlloc can be used to gain memory with execute
capabilities (or other features).

actually, I have a special region of the garbage-collected heap which is
allocated with these options just for these purposes (as well as a "gcexec"
call for allocating executable memory).

note that my dynamic linker uses its own internal memory manager for '.text'
and '.data'/'.bss' sections though (GC'ed executable memory is usually used
for more narrowly defined executable objects, ...).

> --
> bartc
> [This sounds like a project about a decade ago called slim binaries, which
> shipped programs for a Mac in a compact low-level intermediate code, and
> generated 68K or PPC code as the program was loaded. They claimed that the
> intermediate form was enough smaller than native code that the savings in
> disk access more than made up for the time spent translating.

how does it compare with MSIL?...

I have observed with .NET apps that the startup time is noticably worse than
equivalent native apps (or, at least, for apps with an otherwise fairly
quick startup time...).

(I had noted this when compiling C code as C++/CLI...).

I guess though for big user-interface apps, the delay is small enough that
maybe a typical user wouldn't notice.

> You might also want to look at mainframe sort programs. Since the mid
> 1960s (maybe earlier) they've generated the code for the inner comparison
> loop, then run the linker and loader to bring it into RAM, then did the
> sort.
> -John]

interesting idea.

I have been doing something vaguely similar in a few cases to micro-optimize
some tasks that are otherwise fairly slow with more generic code.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Jit Implementation

"BGB / cr88192" <cr88192@hotmail.com>Sun, 21 Mar 2010 13:57:40 -0700

"BGB / cr88192" <cr88192@hotmail.com>
Sun, 21 Mar 2010 13:57:40 -0700