Extending the language

Imagine you want to add a new construction to the Orchids language.  A simple way is to just add a new primitive.  However, in some cases this is not enough. Imagine Orchids did not have multiplication.  We might add it as a primitive, and name it mult for example. However you would then be forced to write, say, $x = mult($y, $z) to multiply $y with $z in Orchids signatures instead of the cosier $x = $y*$z.

To allow for the latter kind of syntax, the first thing you should do is make sure you know how to write a primitive implementing multiplication, just like for simple primitives. We shall use slightly different calling conventions (see ovm_mul later). For now, we assume that we have implemented multiplicaton through a function named issdl_mul, whose C prototype is:

ovm_var_t *issdl_mul(gc_t *gc_ctx, ovm_var_t *var1, ovm_var_t *var2)

This takes its arguments explicitly, as C parameters, not from the stack as Orchids primitives would. It also returns its result instead of pushing it onto the stack. Handling the stack will be done in the code of ovm_mul. For now, note that ovm_mul will have a slightly different calling mechanism than ordinary primitives.

Do not register issdl_mul or ovm_mul as an Orchids primitive, that is, do not try to apply register_lang_function or to insert a row in the issdl_function_g table. That would add it as a primitive (named mult, say), and this is not what we want.

Instead, we shall need to:

  • Extend the lexer
  • Extend the parser
  • Extend the virtual machine
  • Extend the rule compiler
  • Extend the type-checker

Extending the lexer

Just add the following line to the lexer (issdl.l):

"*"        { return (O_TIMES); }

Of course this line is already present in the lexer…  I am using a primitive that already exists as an example.  In particular, if you ever wonder where you have to insert similar lines, just look where the above line lies in issdl.l.

In the above line, we return a token O_TIMES to be used by the parser.  This will be defined in issdl.y, see below.


Extending the parser

We declare O_TIMES by one of the standard yacc declarations %token, %left, %right, or %nonassoc in issdl.y.  (I’ll assume some familiarity with yacc here.)

Here we’ll add it as a left-associative operator with the same precedence as / and %.  Let me give you the surrounding lines as well.

%left O_PLUS O_PLUS_EVENT O_MINUS
%left O_TIMES O_DIV O_MOD
%right PLUSPLUS MINUSMINUS

This notably says that * (O_TIMES) binds tighter than + (O_PLUS) and slacker than ++ (PLUSPLUS).  Please respect the principle of least surprise and use the same precedences than some known language, e.g., those of C.

Then we add a new production to the grammar, or enrich a pre-existing production.  Here we wish products to be expressions.  We therefore add the following clause to the expr production:

expr:
     ...
  | expr O_TIMES expr
    { RESULT($$, build_expr_binop(compiler_ctx_g, OP_MUL, $1, $3)); }
     ...

The code between the curly brackets builds a piece of the abstract syntax tree that will later be compiled to bytecode.

    • compiler_ctx_g is the compiler’s global context.  Despite what the _g ending says, this is not a global variable; rather, it is a macro, which gets the compiler context from the global Orchids context ctx.
    • build_expr_binop builds a binary expression node, tagged with the constant OP_MUL, and with two arguments $1, $3 denoting the two expressions that we are multiplying. OP_MUL is the bytecode that the Orchids virtual machine uses to implement multiplication, and is defined in ovm.h (see below).  The function build_expr_binop is used to build abstract syntax tree nodes for constructions that take two arguments and return a value. For unary constructions, use build_expr_monop. For binary constructions that return a Boolean value, meant to be tested by if tests, say, use build_expr_cond instead (e.g, <= or &&).  Other examples can be found in the grammar file issdl.y.
    • The resulting abstract syntax node should be stored into $$. Instead of writing $$ = build_expr_binop(compiler_ctx_g, OP_MUL, $1, $3), please use the RESULT macro. This not only takes care of calling GC_TOUCH, but will also push $$ onto a stack (ctx->protected, where ctx is the rule compiler context) that the garbage collector will go though to mark objects. This makes sure that none of these newly created nodes will be freed by the garbage collector. You may also use RESULT_DROP, instead of RESULT, to first pop that stack to a predefined level, then push the result. A typical example is the grammar production:
      rule:
        rulekw SYMNAME synchro O_BRACE state states C_BRACE
        { RESULT_DROP($$,$1, build_rule(compiler_ctx_g, &($2), $5, $6, $3)); }
      ;
      

      which first pops the ctx->protected stack to the level given by $1, then pushes the result of build_rule.  The right level is obtained by calling COMPILE_GC_DEPTH on recognizing the first token (RULE):

      rulekw : RULE { $$ = COMPILE_GC_DEPTH(compiler_ctx_g); }
      ;

      We did not do that for multiplication, because it would be impractical to obtain the value of COMPILE_GC_DEPTH before we even recognize the first expr given as multiplicand. In that case, the ctx->protected stack will probably be slightly larger than needed, but that should not cause much of a problem.


Extending the virtual machine

That mostly means adding one bytecode in our example, namely OP_MUL.  Add it at the end of the list of macros of the form OP_… in ovm.h.  Make sure that the total number of opcodes is updated.  At the time of writing, the last opcode becomes number 36, so we must write:

#define OPCODE_NUM 37

Now you need to write the code for the OP_MUL opcode:

static int ovm_mul(isn_param_t *param)
{
  orchids_t *ctx = param->ctx;
  ovm_var_t *op1;
  ovm_var_t *op2;
  ovm_var_t *res = NULL;

  DebugLog(DF_OVM, DS_DEBUG, "OP_MUL\n");
  op2 = (ovm_var_t *)STACK_ELT(ctx->ovm_stack, 1);
  op1 = (ovm_var_t *)STACK_ELT(ctx->ovm_stack, 2);
  param->ip += 1;
  if (!IS_NULL(op1) && !IS_NULL(op2))
    res = issdl_mul(ctx->gc_ctx, op1, op2);
  STACK_DROP(ctx->ovm_stack, 2);
  PUSH_VALUE(ctx, res);
  return 0;
}

As you can see, ovm_mul takes a unique parameter of type isn_param_t * as argument. This is a pointer to a struct that contains all the relevant information: the Orchids context param->ctx (from which we obtain the virtual machine’s stack, and then the two values to multiply) and the program counter param->ip, which we will have to increment.


Extending the rule compiler

We do not have to do anything here!  Since we have already registered our new construction as building an abstract syntax tree through build_expr_binop, the rule compiler already knows how to compile this with the opcode given (OP_MUL).

We would be in an equally cosy situation with build_expr_monop, or build_expr_cond.  Exploring the various other grammar rules, you will see that various other node forming constructs are provided, for various situations.


Extending the type-checker

You need to instruct Orchids about the type(s) of your new construct.  Contrarily to primitives, there is no standardized way of doing that.  The best advice I can tell you is: find an already existing primitive that looks similar to yours, and type it in a similar way.

In our example, since we have added a binary operator, we should look at the compute_stype_binop() function in rule_compiler.c. There is a switch on the binary operator, and we add a new branch:

    case OP_MUL:
      restype = stype_join (ltype, rtype);
      if (restype==&t_any)
        goto type_error;
      if (strcmp(restype->name, "int") &&
          strcmp(restype->name, "uint") &&
          strcmp(restype->name, "float"))
        goto type_error;
      set_type (ctx, myself, restype);
      return;

This reads as follows. We first call the stype_join function to compute the least common super types of the types ltype and rtype of the two arguments. (A super type is a type that contains a superset of values. For example, db[int, uint] is a super type of the type db[*], the latter containing only the empty database.) This makes sure that we know a common type that the two arguments inhabit.

Next, we compare the common type to one of the allowed types, int, uint, or float. Indeed ovm_mul calls issdl_mul, which will call a function pointer, depending on the runtime type of the arguments, and is able to multiply signed and unsigned integers, and floating-point values.

Finally, if we have passed all tests, we call set_type() to set the type of the current expression, myself, to the common type. This does not really set the type of the expression, rather it pushes the task of doing so onto a queue, but that does not really matter.


Extending the static analyzer

We should also update the static analyzer. This means essentially the monotonicity checker.

In our example, since we have added a binary operator, we should look at the compute_monotony_bin() function in rule_compiler.c. There is a switch on the binary operator, and we add a new branch:

    case OP_MUL:
      if (ma==MONO_CONST && mb==MONO_CONST)
	res = MONO_CONST;
      break;

This means that OP_MUL is not particularly monotonic. We are just saying that if both arguments are constant, then the result is constant.

That seems to be the absolute minimum, but that is not true. For example, a function that returns a new number each time it is called, or a construction that reads from a file, or writes to a file and returns an error code, should return MONO_UNKNOWN, not MONO_CONST.

On the opposite, monotonic constructions such as addition are handled as follows:

    case OP_ADD: /* + is meant to be monotonic, whatever the types
		    of the arguments */
      /* MONO+MONO=MONO, ANTI+ANTI=ANTI */
      if ((ma & MONO_MONO) && (mb & MONO_MONO))
	res |= MONO_MONO;
      if ((ma & MONO_ANTI) && (mb & MONO_ANTI))
	res |= MONO_ANTI;
      break;

knowing that res is initialized to MONO_UNKNOWN, that is, to 0. Also, it may be that the two arguments satisfy the two if tests. This happens when ma and mb are equal to MONO_CONST, which is just MONO_MONO | MONO_ANTI. In that case, res will end up containing MONO_CONST, which just says that the sum of two constant arguments is constant.

As Orchids develops, the number of static analyzers grows.  How do you know which analyzers you should modify?  The simplest recipe is as follows.  Find a common byte code that already exists, say,  OP_SUB, and take one that would be the most like the one you want to add.  Then, look for all the places in the code where that is used, by typing grep OP_SUB src/*.[ch] src/modules/*.[ch], and add your own code after each of the relevant lines that you have just found.