Writing functions that allocate memory

The purpose of this note is to explain how to write functions that allocate memory. One basic principle is that one should see the garbage collector as an adversary, which always tries to deallocate memory. You should protect your data against it. Here are a few examples.

The simplest case is that of a function that simply allocates a garbage-collectable data structure, and fills in some fields. Just use gc_alloc(), as in the ovm_int_new() function:

ovm_var_t *ovm_int_new(gc_t *gc_ctx, long val)
{
  ovm_int_t *n;

  n = gc_alloc (gc_ctx, sizeof (ovm_int_t), NULL);
  n->gc.type = T_INT;
  n->val = val;
  return OVM_VAR(n);
}

Now consider a slightly more complicated function: build_integer() builds an object of type node_expr_t, which holds a pointer in its data field to an ovm_var_t holding an integer. We need to allocate the ovm_var_t, then the node_expr_t, and store the former into the latter’s data field. The garbage collector context is in ctx->gc_ctx, so we might think of writing the following, but this is wrong.

node_expr_t *build_integer(rule_compiler_t *ctx, int i)
{
  ovm_var_t    *integer;
  node_expr_term_t  *n;

  integer = ovm_int_new(ctx->gc_ctx, i);
  n = (node_expr_term_t *) gc_alloc (ctx->gc_ctx, sizeof(node_expr_term_t),
                     &node_expr_term_class);
  n->type = NODE_CONST;
  n->data = integer;

  return (node_expr_t *)n;
}

Here is what goes wrong with this code snippet. The object integer created by ovm_int_new() is not pointed to by any data structure that the garbage collector can mark. The subsequent call to gc_alloc() will invoke the garbage collector, to make sure some memory is deallocated, and might well deallocate the integer object we have just allocated.

This problem is nasty. In particular, gc_check() has almost no way of detecting it.

Oh, you might think that this will never happen, since the garbage collector would have to be damned quick to deallocate integer this fast.

Please never, never reason that way. You are right in thinking this will almost never happen, but this is precisely the problem: when the bug occurs (and be sure it will), it will be undiagnosable, irreproducible, impossible to debug, and will make Orchids fail in horrible ways.

Back to our problem: we need to protect integer against the garbage collector. The standard solution is by using the GC_START, GC_UPDATE, and GC_END macros:

node_expr_t *build_integer(rule_compiler_t *ctx, int i)
{
  ovm_var_t    *integer;
  node_expr_term_t  *n;

  GC_START(ctx->gc_ctx, 1);
  integer = ovm_int_new(ctx->gc_ctx, i);
  GC_UPDATE(ctx->gc_ctx, 0, integer);

  n = (node_expr_term_t *) gc_alloc (ctx->gc_ctx, sizeof(node_expr_term_t),
				     &node_expr_term_class);
  n->type = NODE_CONST;
  n->data = integer;

  GC_END(ctx->gc_ctx);
  return (node_expr_t *)n;
}

The GC_START macro allocates an array of protected entries on the C stack. You need to tell it how many entries this array will contain: 1 in our example. The entries are initialized to NULL. We then use GC_UPDATE to protect integer, storing it at index 0 in the array. (But beware! This code still has problems. Read on.)

Each call to GC_START must be matched by exactly one call to GC_END. To help check this, GC_START expands to some piece of code that contains an open curly bracket ({), and GC_END contains the matching closing curly bracket (}). Mismatches should be flagged by auto-indenting text editors.

As a consequence, if your code is indented strangely, don’t dismiss that as a text formatting problem! It might be indicative of serious GC_START/GC_END balancing bug.

By the way, don’t try to be smart: call GC_START at the beginning of your function with the right number of entries, GC_END at the end, and that’s it.

And don’t use return in the middle of your code, either. For example, the following is wrong:

{
  GC_START(gc_ctx, 3);
  /* do some stuff */
  if (/* somme error condition */)
    return 1;
  GC_END(gc_ctx);
  return 0;
}

This is wrong because the return 1 instruction will fail to execute the code hidden inside the GC_END macro. Since this codes unlinks pointers stored on the stack, this will really cause havoc after return.

Our latest attempt at writing build_integer() is still wrong. This time, gc_check() is more likely to spot the problem, but you’re safer if you don’t count on it.

The problem is that the assignment n->data = integer may violate the invariant that no black object points directly to a white object. You should use GC_TOUCH to correct the problem, as follows:

node_expr_t *build_integer(rule_compiler_t *ctx, int i)
{
  ovm_var_t    *integer;
  node_expr_term_t  *n;

  GC_START(ctx->gc_ctx, 1);
  integer = ovm_int_new(ctx->gc_ctx, i);
  GC_UPDATE(ctx->gc_ctx, 0, integer); /* protect integer */

  n = (node_expr_term_t *) gc_alloc (ctx->gc_ctx, sizeof(node_expr_term_t),
				     &node_expr_term_class);
  n->type = NODE_CONST;
  GC_TOUCH (ctx->gc_ctx, n->data = integer);

  GC_END(ctx->gc_ctx);
  return (node_expr_t *)n;
}

In general, assignments of garbage-collectable objects should be enclosed in a call to GC_TOUCH. The style exemplified above is recommended. In particular, don’t use the (equivalent) idiom:

  n->data = integer; /* suspicious assignement, may violate no-black-object-pointing-directly-to-white-object invariant */
  GC_TOUCH (ctx->gc_ctx, n->data);

Code lines similar to n->data = integer, without an enclosing GC_TOUCH, should be seen as suspicious. You should always prefer the following style:

  GC_TOUCH (ctx->gc_ctx, n->data = integer);

which makes it clear that the `no black object pointing directly to a white object’ invariant is satisfied.

A final word (for today). One should protect data, not against gc_alloc(), as would seem to be the case looking at the above example, but against any function that may call the garbage collector.

These include gc_alloc(), gc(), gc_full(), but also gc_base_malloc() and gc_base_realloc() (which do ordinary C-style memory allocation and reallocation, but also call the garbage collector to be sure they will have enough memory), and all functions that call them directly or indirectly.

As an illustration, here is the actual code for build_integer(). The only difference with the above is that build_integer() also adds integer to a table of so-called static values, akin to Java’s constant pool, using statics_add(). The latter may need to allocate some memory, and in that case will do it using gc_base_realloc(). We therefore need to protect n itself. The safest way is to use GC_UPDATE again on it. Don’t forget to modify the call to GC_START so that it allocates an array of 2 elements, not 1, this time:

node_expr_t *build_integer(rule_compiler_t *ctx, int i)
{
  ovm_var_t    *integer;
  node_expr_term_t  *n;

  GC_START(ctx->gc_ctx, 2);
  integer = ovm_int_new(ctx->gc_ctx, i);
  GC_UPDATE(ctx->gc_ctx, 0, integer); /* protect integer */

  n = (node_expr_term_t *) gc_alloc (ctx->gc_ctx, sizeof(node_expr_term_t),
				     &node_expr_term_class);
  n->type = NODE_CONST;
  GC_TOUCH (ctx->gc_ctx, n->data = integer);
  n->res_id = ctx->statics_nb;
  GC_UPDATE(ctx->gc_ctx, 1, n); /* protect n */

  statics_add(ctx, integer);
  GC_END(ctx->gc_ctx);
  return (node_expr_t *)n;
}

Well, I’ve lied. This is not quite the actual code to build_integer(). The actual code reuses slot 0 in the GC_START array: integer does not need to be stored there, since it will be protected by the data field from the protected n object. However, don’t try to be that smart. There are many ways to do smart things that will cause havoc with the garbage collector.