Type-punning and the strict aliasing rule

The ANSI C standards were obviously largely influenced by compiler writers. This is a good thing: these are the guys would should know about the language. However, they sometimes enforce semantics assumptions that allow for very efficient optimizations. Ultimately, the main problem is that these assumptions are not or only partially checked by the compiler, but the compiler will still do as though these assumptions are true, and modify your code accordingly. I will concentrate on one of these assumptions here: the strict aliasing rule, and its pretty bad way of mixing with type-punning.

Type-punning is the practice of pretending an object of some type really is of some other type. This is accomplished in C by using casts or union types.

Orchids relies a lot on casts between pointer types: there is some type-punning all over the place in its code! For example:

  • In the Orchids virtual machine, all values that it handles are cast to the C type of pointers to ovm_var_t, but are actually pointers to objects of type ovm_int_t, or ovm_str_t, or ovm_vstr_t, or etc. The latter types are the member fields of the union type defining the C type ovm_var_t.
  • In the Orchids parser, the nodes of the parse tree are of type node_expr_t *, but are actually pointers to objects of type node_expr_binop_t, node_expr_sym_t, etc. The latter types are not even designed as member fields of the type node_expr_t.
  • All objects of a garbage-collectable type (and that includes all of the above types) are pointers to objects of a struct type whose first field is gc_header_t gc; this way, the garbage collector knows it can cast all of these pointers to gc_header_t *, its basic data structure, and work on the latter.

Any compiler for a langage with pointers must deal with the thorny issue of pointer aliasing. For example, if you write:

{
  int n = 10;
  int *p, *q;

  p = &n;
  q = &n;
  *p = 29;
}

then p and q are pointers which are aliases: they point to the same address, namely &n. The final assignement to *p also changes *q and n.

The dream of any compiler optimization writer would be to have a simple, efficient way of guaranteeing that a given pointer has no alias, or at least to find a short list of all pointers that may be aliases of it.

A very simple trick that improves upon any alias detection technique in optimizers consists in realizing that pointers to objects of different types cannot be aliases of each other. However, while this would certainly be true in a strongly, statically typed language, this is obviously false in a language with casts.

This is a problem: you cannot count on this trick to improve your C optimizer.

Or can you? The ANSI C normalization committees have decided on the strict aliasing rule, which is basically to say that if you ever access the same data (read access, or write access) through two different pointers to different types, then the semantics of your program is undefined.  Undefinedness is the standard trick of the ANSI C committee to say “you’ve been warned”.

Compilers usually try to detect such undefined cases, and warn you about them.  For example, gcc might tell you (in -O2 mode) “type-punning to incomplete type might break strict aliasing rules”, or “type-punned pointer will break strict aliasing rules”.  However, gcc also sometimes just remains silent and produces weird code…

What is the effect of all this?

For example, the GC_START(), GC_UPDATE() and GC_END() macros rely heavily on type-punning. I’ve once had a bug where, inside a GC_START/GC_END section, I did some computations and stored the intermediate, garbage-collectable results by using GC_UDPATE(). This code failed miserably, with the following strange behavior:

  • It core dumped, showing signs that all memory structures had been garbled (ASCII strings instead of memory addresses in pointers that the garbage collector was using), but gc_check() never found anything wrong until the bug actually fired
  • The code worked perfectly if I did not use the -O2 option, or even if I only used some moderate amount of optimization (-O1)
  • After some modifications of the code, I realized that the pointer gc_ctx->stack_data had not been restored by GC_END() to the value it had before I did GC_START(); adding pointer equality checks into the code to check when this value changed suddenly spurred gc into complaining about type-punned pointers breaking the strict aliasing rule. Note that gcc complained about the bug-checking code, not on the actual function I was checking.

I have no general solution to that kind of bug.  Although the code of Orchids uses extensively that kind of type-punning in modules (the REGISTER_EVENTS() macro is a horrible case of type-punning), this seems to work at the moment.  In the code mentioned above, I did things such as:

{
  varset_t *vs;
  GC_START(gc_ctx, 1);

  vs = <whatever>;
  GC_UPDATE(gc_ctx, 0, vs);

then I used both vs or (varset_t *)GC_LOOKUP(0) which, after all, are exactly the same object. The C compiler is entitled to consider that they are distinct pointers, since vs is a pointer to varset_t and GC_LOOKUP(0) is a pointer to gc_header_t (or to void, depending on versions). Therefore side effects to one pointed object are deemed to have no effect on the other one, and the C compiler will optimize away any instruction which, as a result, is deemed to have no effect on the final computed result.