Re: How to implement dynamic typing?

"BGB / cr88192" <cr88192@hotmail.com>
Mon, 26 Apr 2010 18:38:59 -0700

          From comp.compilers

Related articles
[23 earlier articles]
Re: How to implement dynamic typing? mikelu-1004cc@mike.de (Mike Pall) (2010-04-21)
Re: How to implement dynamic typing? bartc@freeuk.com (bartc) (2010-04-21)
Re: How to implement dynamic typing? gneuner2@comcast.net (George Neuner) (2010-04-22)
Re: How to implement dynamic typing? gneuner2@comcast.net (George Neuner) (2010-04-23)
Re: How to implement dynamic typing? cr88192@hotmail.com (BGB) (2010-04-23)
Re: How to implement dynamic typing? bartc@freeuk.com (bartc) (2010-04-24)
Re: How to implement dynamic typing? cr88192@hotmail.com (BGB / cr88192) (2010-04-26)
Re: How to implement dynamic typing? gneuner2@comcast.net (George Neuner) (2010-05-11)
| List of all articles for this month |

From: "BGB / cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Mon, 26 Apr 2010 18:38:59 -0700
Organization: albasani.net
References: 10-04-009 10-04-028 10-04-031 10-04-036 10-04-038 10-04-051 10-04-053 10-04-054 10-04-065 10-04-066
Keywords: types
Posted-Date: 28 Apr 2010 00:35:59 EDT

"bartc" <bartc@freeuk.com> wrote
> "BGB" <cr88192@hotmail.com> wrote in message


> (These += or +:= assignments are always troublesome)
>
> Converting a+:=b to a:=a+b is what I used to do in one compiler (and a
> complex 'a' expression was evaluated twice).
>
> Now I generally do something like:
>
> ptemp:=&a
> ptemp^ +:= b # or *ptemp = b in C-speak
>
> but more efficiently than that looks (ptemp stays in a register). And for
> very simple 'a' expressions, it can deal with a+:=b directly.


ok.


my stuff tends to break things down, and re-form them later, since often the
re-formed version is different from the original.






>> ex: "Foo.bar[3].baz+=3;".
>> "Foo.bar[3]" would be evaluated and dup'ed ('.' has a different operation
>> for lvalues and rvalues).
>
> And that probably sounds like the same trick.


yeah.


my stuff typically uses stack machines for the IL.




>>('.' has a different operation for lvalues and rvalues).
>
> I use four kinds of l-value expressions:
>
> 1. Direct memory a:=b
> 2. Pointer a^:=b
> 3. Index a[i]:=b
> 4. Dot a.i:=b
>
> However complex the lhs, it always boils down to one of these, unless
> there's something I've missed out.


in my case, the difference is more often that lvalues are reworked into
assignment opcodes.


"a[i]=b;" => "%b %a %i storeindex"


it is also possible to load the addr and assign via the address, but this is
less efficient.




>>> 2) Continual checks for freeing and duplicating memory, as there is no
>>> GC to offload this stuff to.
>>
>> bleh...
>
> Yeah. But, then, in a:=b+c, these checks come down to 4 instructions of
> overhead, when the type of a,b,c is simple (two cmp, and two jumps).


in some of my interpreters, I tend to combine together multiple opcodes
which often happen together.
with this strategy I was able to get an interpreter down to a 10x slowdown
(vs native) for some fragments, and match C-like speeds while still using
dynamic typing (although I was using type inference).


this was my "great JIT success", although a C compiler proved to be a much
more complex task than expected.


like my early C compiler, registers were assigned to particular fixed uses,
and some minor amount of code-generation control was accomplished via flags
(for example, setting a flag to indicate, for example: the top stack item is
in EAX, a secondary stack item resides in EDX, ...).




> I will play around with GC later, for user-level allocated memory (ie.
> explicit use of pointers). Introducing it for this behind-the-scenes stuff
> is more tricky, if I want the language to work the same way (ie. variables
> never share their data with each other, and everything is mutable).


yes, ok.
I have nearly always used GC...


however, early on I wanted to avoid tangling my C compiler into my GC and
typesystem mechanics.


anymore, if I were to design a new lower-end, I would probably use a
different strategy, namely having built-in "core" machinery, and leaving the
rest as pluggable/customizable logic (and not regard the codegen as some
sort of black-box...).


as-is, I have 2 different sets of code-generation machinery:
my main codegen, which does more traditional compilation tasks;
a bunch of machinery spread around my "BGBDY" system (mostly manages dynamic
typing, OO features, ...), and differs primarily in that it is somewhat more
decentralized (and, at present, would be incapable of doing traditional
compilation).


most of the code-generation machinery in BGBDY is deeply tied into the
workings of the type-system, and generally operates without the sorts of
facilities available within my main codegen (such as a register allocator,
...).


similarly, a lot of this stuff is also done via the C calling convention, so
the usual strategy is to produce function pointers which can be called to
perform particular tasks.




>> nevermind, my float28 format (used on x86, x86-64 uses float48) has poor
>> accuracy (still better than float24, which was my original "flonum"
>> format).
>
> (What do you do with the other 4 bits?)


vs float24 or float32?...


vs float24: the 4 bits are used to make the mantissa larger (and this make
it more accurate).
vs float32: the remaining 4 bits are used to cram these into the address
space.


for example, you can't simply shove a 32-bit float into a 32-bit pointer, as
then one has no idea what is a float and what is a pointer.


but, shave off a few bits, and one can then use a part of the address space
as the value space, and there is no longer a conflict.


consider, for example, if the address range:
0xC0000000..0xCFFFFFFF
were used for floats...


normal data can't go there, so may as well use the space.


then:
0xD0000000..0xDFFFFFFF could be assigned to integers, ...
0xE0000000..0xFFFFFFFF then goes to other things (typically much
smaller...).
...


however, in my framework, this space is assigned dynamically (much like with
a memory allocator), and so there are no fixed assigned ranges (and I use 24
bits, and not 28 bits, for integers...).


on x86-64, I just pick the arbitrary address range:
0x71000000'00000000..0x71FFFFFF'FFFFFFFF


and use this for a similar reason, but limit allocations to 48 bits, or:
0x00010000'00000000...


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.