Re: Implementing Closures

Barry Kelly <barry.j.kelly@gmail.com>
Sun, 26 Apr 2009 05:00:36 +0100

          From comp.compilers

Related articles
Implementing Closures andrew@tomazos.com (Andrew Tomazos) (2009-04-24)
Re: Implementing Closures armelasselin@hotmail.com (Armel) (2009-04-24)
Re: Implementing Closures dot@dotat.at (Tony Finch) (2009-04-26)
Re: Implementing Closures barry.j.kelly@gmail.com (Barry Kelly) (2009-04-26)
Re: Implementing Closures cr88192@hotmail.com (cr88192) (2009-04-26)
Re: Implementing Closures torbenm@pc-003.diku.dk (2009-04-28)
Re: Implementing Closures pertti.kellomaki@tut.fi (Pertti Kellomaki) (2009-04-29)
Re: Implementing Closures torbenm@pc-003.diku.dk (2009-04-29)
Re: Implementing Closures dot@dotat.at (Tony Finch) (2009-04-29)
Re: Implementing Closures haberg_20080406@math.su.se (Hans Aberg) (2009-04-29)
[2 later articles]
| List of all articles for this month |

From: Barry Kelly <barry.j.kelly@gmail.com>
Newsgroups: comp.compilers
Date: Sun, 26 Apr 2009 05:00:36 +0100
Organization: Compilers Central
References: 09-04-056
Keywords: storage, design
Posted-Date: 27 Apr 2009 05:55:04 EDT

Andrew Tomazos wrote:


> int2int f()
> {
> int x = 3;
>
> int g(int y) // local function
> {
> x = x + y; // modifies local variable from enclosing scope.
> return x * y;
> }
>
> return g;
> }


> What are some of the approaches a compiler author could use to
> implement such a language feature?


I recently implemented closures in Delphi, a Pascal-based language
with a near-equivalent low-level expressiveness to C, and many
high-level constructs to make programming easier and less error-prone.


Apart from fat function pointers (tuples of code and data) there is
another approach, one that uses Windows COM-style primitives to fit
the closure reference into a single pointer, at the expense of
slightly slower invocation and slightly more code. On the other hand,
normal function pointers as well as method pointers (i.e. including a
'this' reference) can also be stored in these closure types with very
similar efficiency and without the callers needing to know. And
finally, but not without importance for Delphi, these closures are
compatible with C/C++ through COM idioms.


(Delphi already has a fat function pointer, what it calls method
pointers. These aren't suitable for closure implementation because there
would be no automatic memory management for the closure's captured
state.)


Delphi supports COM interfaces natively. COM interfaces are
reference-counted, and this reference count is automatically handled by
the Delphi compiler for variable initialization and finalization,
assignments, copies etc.


The approach I took leveraged this support by modeling closure types as
COM interfaces with a single method, Invoke, with the same signature as
the closure type's declaration, and modeling closures as implementations
of those COM interfaces. COM reference counting for garbage collection
of the captured variables works well when closures don't have cycles in
their reference graph. This is usually the case since variable capture
can only operate up the stack; a closure reference would need to be
explicitly stored inside a captured variable to break this.


At compilation time, closures are discovered due to specific syntax
(anonymous methods in the case of Delphi), though e.g. detection of
assignment of local function identifier to a closure-typed location is
also possible (not yet implemented). The body of the code of the closure
is scanned for references to variables in outer scopes - these variables
will be captured. A type is created for each scope that contains
captured variables, and each captured variable is turned into a field on
this type; all references to captured variables are rewritten to field
accesses. The body of code for each closure becomes a method on the this
type. References to captured variables higher than the immediately
surrounding scope are implemented through chaining of scopes, each scope
type referring to the outer scope, until the next outer scope doesn't
transitively contain any captured variables.


This scope type is carefully implemented so that it is a conforming COM
implementation of the appropriate COM-style interface for the closure
type, one implementation per closure (now method) on the scope type.


So, your example, translated to C in the style of my Delphi
implementation, might look somewhat like this:


typedef int (*int2int_Invoke)(int2int *self, int);
typedef int (*int2int_Refcounter)(int2int *self);


typedef struct int2int_vtable_s {
    // COM also has QueryInterface, but it's not used here
    int2int_Refcounter AddRef;
    int2int_Refcounter Release;
    int2int_Invoke Invoke;
} int2int_vtable_t;


// closure types (and COM interfaces) are pointers to
// pointers to function pointers
typedef struct int2int_s {
    int2int_vtable_t *vtable;
} *int2int;


typedef struct f_scope_s {
    base_vtable_t *vtable; // In Delphi implementation, there is a base
            // class implementation of IUnknown incuding AddRef & Release
            // and refcount data. (It's called TInterfacedObject.)
    // If captured variables at higher scopes, then reference to outer
    // scope type here also.
    int2int_vtable_t *g_impl;
    // One implementation vtable reference here per closure in this scope.
    int x;
} f_scope;


static int f_scope_g_impl_AddRef(void *self)
{
    // in Delphi implementation, this is actually a jump and not a
    // function. Its purpose is to fix up the 'self' pointer, which
    // will be pointing at the 'g_impl' vtable, to point to the
    // start of the object
    return base_AddRef( ((char *)self) - offsetof(f_scope, g_impl) );
}


// Release is similar


static int f_scope_Invoke(void *self, int y)
{
    ((f_scope *) self)->x += y;
    return ((f_scope *) self)->x * y;
}


static int f_scope_g_impl_Invoke(void *self, int y)
{
    // again, actually a jump
    return f_scope_Invoke(((char *) self) - offsetof(f_scope, g_impl), y);
}


// In the Delphi implementation, these are not actually functions
// => ignore conversion error
static int2int_vtable_t f_scope_g_impl_vtable = {
    f_scope_g_impl_AddRef,
    f_scope_g_impl_Release,
    f_scope_g_impl_Invoke,
};


int2int f(void)
{
    f_scope *scope = malloc(sizeof(f_scope));
    // base init: assume base initializes f_scope to have refcount of 1
    scope->g_impl = &f_scope_g_impl_vtable;
    scope->x = 3;
    return (int2int) &scope->g_impl;
}


Invocation later looks like this:


    int2int foo = f();
    int result = foo->vtable->Invoke(foo, 2); // #1
    printf("result = %d\n", result); // prints 10
    foo->vtable->Release(); // auto-inserted by compiler at end of closure
                                                    // variable scope; decrements refcount and
                                                    // frees if 0.


The call to the Invoke function pointer at #1 will follow through to
f_scope_g_impl_Invoke, which will adjust self, which of course points at
the g_impl member, not at the start of the f_scope type.


Execution then flows to the closure body at f_scope_Invoke, which
calculates and returns the result.


Delphi COM interface mechanics take care of the lifetime of the closure
instance. Syntax sugar exists to make closure type invocation look like
normal function invocation, even though it is actually an interface
method call.


-- Barry


--
http://barrkel.blogspot.com/



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.