Contrarily to all versions of OrchIDS ≤1.1, OrchIDS 2.0 has garbage-collected memory.
This means that you still allocate memory, but do not deallocate it explicitly. This imposes a certain number of constraints on the way you program.
The garbage collector is implemented in file src/util/gc.c. This is a mark-during-sweep garbage collector, an algorithm described by Queinnec, Beaudoing and Queille in 1989. This little-known algorithm is an incremental garbage collector that improves upon Dijkstra’s three-color incremental garbage collector.
Let us say that, at any point in time, every garbage-collectable piece of data can be white, black, or grey. The black objects are those that have been marked as used by the garbage collector. The white objects are those that have not yet been marked by the garbage collector (the objects that eventually remain white after the end of the marking phase are those that will be deallocated). Finally, the grey objects are those that are in the process of being marked.
One should remember that no black object should ever point directly to a white object. To store a pointer to a garbage-collectable object p into another one, one must always shade p first, namely, make it grey if it was white. This is the purpose of the gc_touch()
function.
The main functions are:
- The allocator itself:
void *gc_alloc (gc_t *gc_ctx, size_t n, gc_class_t *class);
Allocates a garbage-collectable,
n
byte memory zone.
The first argument,gc_ctx
, is the garbage collector context. This is a struct containing all the information that the garbage collector needs to function propertly. You are not meant to touch it in any way.
Theclass
argument describes all that the garbage-collector needs to know about your structure, in particular, how to mark its subfields. It should not beNULL
. Seegc_class_t
below for more explanations.gc_alloc()
returns an object of typevoid *
. Internally, however,gc_alloc()
allocates an object of sizen
that starts with agc_header_t
header. In particular,n
should be ≥sizeof(gc_header_t)
.
The basic strategy to define a garbage-collectable type is to define it as a struct with a first field of typegc_header_t
. For example, the Orchids native integer typeovm_int_t
is defined by:struct ovm_int_s { gc_header_t gc; long val; };
and you would allocate an integer by calling:
struct ovm_int_s *s = gc_alloc(gc_ctx, sizeof(struct ovm_int_s), &int_class); s->gc.type = T_INT;
The
int_class
gc_class
we are using for that structure, and we shall explain this bit later.
Note that you should follow the call togc_alloc()
by code that sets the type (here,T_INT
).
If you create a newgc_class
, you must also enforce the following relation between the type (here,T_INT
) and thegc_class
: there is an arraygc_classes[256]
inlang.c
, and its entry at index given by the type should be the correspondinggc_class
; in our case, you should check thatgc_classes[T_INT]
indeed contains&int_class
. - By the way, the purpose of declaring the return type of
gc_alloc()
to bevoid *
instead ofgc_header_t *
is to allow you to write code similar to the above. Otherwise, you would need a cast, and would have to write:struct ovm_int_s *s = (struct ovm_int_s *)gc_alloc(gc_ctx, sizeof(struct ovm_int_s), NULL);
.
Any data allocated bygc_alloc()
is created white. This means that it can be deallocated at the next call togc()
,gc_full()
, orgc_alloc()
itself! If you mean to use the object you’ve just created, you should first store it into some other garbage-collectable object… callingGC_TOUCH()
on it first. This is a macro that encapsulates a call to thegc_touch()
function, which does all the work. - The shading function:
void gc_touch (gc_t *gc_ctx, gc_header_t *p); #define GC_TOUCH(gc_ctx,p) gc_touch(gc_ctx,(gc_header_t *)p)
shades
p
. This is required before storingp
into an object. Shading means makingp
grey if it was white. This way, you can storep
into some other object without running the risk of having a black object point to a white object.
Hence, in particular, storing a garbage-collected objectp
at some locationloc
should always be written:GC_TOUCH (gc_ctx, loc=p);
and never
loc=p;
, for example. To make it clear that you have not forgotten the call toGC_TOUCH()
, you should always use the above formGC_TOUCH (gc_ctx, loc=p);
, and not equivalent forms such asloc=p; GC_TOUCH (gc_ctx, p);
.
By the way, shading works slightly differently in one case: if thegc_class
ofp
has amark_subfields
function pointer equal toNULL
, this lets the garbage-collector know thatp
has not garbage-collectable subfield, and thengc_touch()
will makep
black directly, instead of grey. This makes the garbage-collector more efficient in those cases. It is therefore relatively important to be able to leave themark_subfields
entryNULL
ingc_class
es. - The garbage-collector itself:
void gc (gc_t *gc_ctx); void gc_full (gc_t *gc_ctx);
gc()
calls the incremental garbage collector, and tries to mark and to reclaim a few objects. By default, it marks 3 objects and tries to reclaim 2.5 objects on average. You should almost never need to callgc()
yourself, asgc_alloc()
already callsgc()
to get back a bit of memory itself.gc_full()
does a complete round of garbage collection. You should call it twice in a row, in critical low-memory situations, to reclaim as much memory as you can. This is whatgc_alloc()
does anyway when memory runs low. - Object classes:
typedef struct gc_class_s gc_class_t;
A record of specific function hooks for specific kinds of garbage-collectable objects. The most useful one is the
mark_subfields
hook, which is used to mark garbage-collectable subobjects. For example, one may define a garbage-collectable type of binary trees as follows:typedef struct tree { gc_header_t h; int n; struct tree *left, *right; } tree;
If you allocated a tree object using a
NULL
class argument, the garbage collector would have no way of realizing that it should mark theleft
andright
fields, too, recursively. To tell the garbage collector it has to do so, callgc_alloc()
with the argument&tree_class
, where:gc_class_t tree_class = { GC_ID('t','r','e','e'), tree_mark_subfields, tree_finalize, tree_traverse, tree_save, tree_restore };
- The
tree_mark_subfields()
function is defined as follows. It simply shades the two garbage-collectable subfields:static void tree_mark_subfields (gc_t *ctx, gc_header_t *p) { tree *t = (tree *)p; GC_TOUCH (ctx, t->left); GC_TOUCH (ctx, t->right); }
It is also possible to leave the
mark_subfields
hook equal toNULL
, in which case the garbage-collector will consider that no subfield has to be marked. This would be wrong for binary trees, of course. However, it makes sense for types such asovm_int_t
, which have no garbage-collected subfields (only along
subfield). It is even advised to leave themark_subfields
hook equal toNULL
when there are no subfields, because it makes the garbage-collector more efficient on those types (see the earlier discussion ongc_touch()
).
There are four other function hooks that need to be defined. The third one,traverse
(here, defined astree_traverse()
) operates a more generic way of traversing the same subfields. We keep themark_subfields
hook separate for efficiency reason, although it could have been implemented in terms of thetraverse
hook. Here is the necessary (boilerplate) code for trees:static int tree_traverse (gc_traverse_ctx_t *gtc, gc_header_t *p, void *data) { tree *t = (tree *)p; int err = 0; err = (*gtc->do_subfield) (gtc, (gc_header_t *)t->left, data); if (err) return err; err = (*gtc->do_subfield) (gtc, (gc_header_t *)t->right, data); return err; }
The
finalize
hook (here,tree_finalize()
) is meant to be called to free up any other memory pointed to by the object in argument, provided such memory is not directly under the control of the garbage collector. It can also be used to close file descriptors, for example, when the object is deallocated. In our example,tree_finalize()
does nothing:static void tree_finalize (gc_t *ctx, gc_header_t *p) { return; }
In that case, it is allowed to leave the
finalize
equal toNULL
, instead of setting it to the do-nothing functiontree_finalize()
.
The fourth hook,save
, implements one half of a save-and-restore mechanism, which allows OrchIDS to dump its internal state to a file, and later recover from the saved file anew. Thesave
hook must run through all the subfields of the objectp
and call the corresponding save functions. In the case of trees, for example, we would write:static int tree_save (save_ctx_t *sctx, gc_header_t *p) { tree *t = (tree *)p; int err; err = save_int (sctx, t->n); if (err) return err; err = save_gc_struct (sctx, (gc_header_t *)t->left); if (err) return err; err = save_gc_struct (sctx, (gc_header_t *)t->right); return err; }
Note that this works slightly differently from the
mark_subfields
andtraverse
hooks: we also call save functions, recursively, on the fields that are not under the control of the garbage-collector, such asn
here.
Thesave
hook cannot be left toNULL
.
Conversely, you must write a restore hook as follows.static gc_header_t *tree_restore (restore_ctx_t *rctx) { gc_t *gc_ctx = rctx->gc_ctx; tree *t; tree *left, *right; int n; int err; t = NULL; /* pre-set result to NULL, in case an error occurs. */ GC_START (gc_ctx, 2); /* We shall need to protect the left and right fields against the GC before we can store them into the newly allocated tree. */ err = restore_int (gc_ctx, &n); /* get back the n field */ if (err) { errno = err; goto end; } /* error reporting is through the errno variable. */ left = (tree *)restore_gc_struct (rctx); if (left==NULL && errno!=0) goto end; /* propagate error if one happened in restoring left; testing left==NULL is not enough, as NULL might be a legal tree. */ GC_UPDATE (gc_ctx, 0, left); /* protect left against GC! */ right = (tree *)restore_gc_struct (rctx); if (right==NULL && errno!=0) goto end; GC_UPDATE (gc_ctx, 1, right); /* protect right against GC! */ t = gc_alloc (gc_ctx, sizeof(tree), &tree_class); t->gc.type = T_TREE; /* assuming this tag is defined in lang.h--- otherwise add it, as a number between 0 and 255, making sure it has not been taken already; then make sure the gc_classes[] array, at the end of lang.c, contains &tree_class at index T_TREE. */ t->n = n; GC_TOUCH (gc_ctx, t->left = left); /* do not forget GC_TOUCH()! */ GC_TOUCH (gc_ctx, t->right = right); /* do not forget GC_TOUCH()! */ end: GC_END (gc_ctx); return t; }
The
restore
hook cannot be left toNULL
. - Debugging:
void gc_check (gc_t *gc_ctx);
Looks for inconsistencies in memory. Works similarly to
fsck
on Unix systems: call it to check that the garbage collector is in a sane state, that is, that it satisfies all the invariants it is meant to satisfy. For example, this will check that no black object directly points to any white object, as mentioned above.To detect memory and garbage collector related bugs, sprinkle your code with calls to
gc_check()
, and launch OrchIDS under gdb (or lldb on Mavericks MacOSX and later).gc_check()
shouldabort()
and fall back to the debugger the first time it is called after something went wrong.