Imagine you want to add a new construction to the Orchids language. A simple way is to just add a new primitive. However, in some cases this is not enough. Imagine Orchids did not have multiplication. We might add it as a primitive, and name it mult
for example. However you would then be forced to write, say, $x = mult($y, $z)
to multiply $y
with $z
in Orchids signatures instead of the cosier $x = $y*$z
.
To allow for the latter kind of syntax, the first thing you should do is make sure you know how to write a primitive implementing multiplication, just like for simple primitives. We shall use slightly different calling conventions (see ovm_mul
later). For now, we assume that we have implemented multiplicaton through a function named issdl_mul
, whose C prototype is:
ovm_var_t *issdl_mul(gc_t *gc_ctx, ovm_var_t *var1, ovm_var_t *var2)
This takes its arguments explicitly, as C parameters, not from the stack as Orchids primitives would. It also returns its result instead of pushing it onto the stack. Handling the stack will be done in the code of ovm_mul
. For now, note that ovm_mul
will have a slightly different calling mechanism than ordinary primitives.
Do not register issdl_mul
or ovm_mul
as an Orchids primitive, that is, do not try to apply register_lang_function
or to insert a row in the issdl_function_g
table. That would add it as a primitive (named mult
, say), and this is not what we want.
Instead, we shall need to:
- Extend the lexer
- Extend the parser
- Extend the virtual machine
- Extend the rule compiler
- Extend the type-checker
Extending the lexer
Just add the following line to the lexer (issdl.l
):
"*" { return (O_TIMES); }
Of course this line is already present in the lexer… I am using a primitive that already exists as an example. In particular, if you ever wonder where you have to insert similar lines, just look where the above line lies in issdl.l
.
In the above line, we return a token O_TIMES
to be used by the parser. This will be defined in issdl.y
, see below.
Extending the parser
We declare O_TIMES
by one of the standard yacc declarations %token
, %left
, %right
, or %nonassoc
in issdl.y
. (I’ll assume some familiarity with yacc here.)
Here we’ll add it as a left-associative operator with the same precedence as /
and %
. Let me give you the surrounding lines as well.
%left O_PLUS O_PLUS_EVENT O_MINUS %left O_TIMES O_DIV O_MOD %right PLUSPLUS MINUSMINUS
This notably says that *
(O_TIMES
) binds tighter than +
(O_PLUS
) and slacker than ++
(PLUSPLUS
). Please respect the principle of least surprise and use the same precedences than some known language, e.g., those of C.
Then we add a new production to the grammar, or enrich a pre-existing production. Here we wish products to be expressions. We therefore add the following clause to the expr
production:
expr: ... | expr O_TIMES expr { RESULT($$, build_expr_binop(compiler_ctx_g, OP_MUL, $1, $3)); } ...
The code between the curly brackets builds a piece of the abstract syntax tree that will later be compiled to bytecode.
compiler_ctx_g
is the compiler’s global context. Despite what the_g
ending says, this is not a global variable; rather, it is a macro, which gets the compiler context from the global Orchids contextctx
.build_expr_binop
builds a binary expression node, tagged with the constantOP_MUL
, and with two arguments$1
,$3
denoting the two expressions that we are multiplying.OP_MUL
is the bytecode that the Orchids virtual machine uses to implement multiplication, and is defined inovm.h
(see below). The functionbuild_expr_binop
is used to build abstract syntax tree nodes for constructions that take two arguments and return a value. For unary constructions, usebuild_expr_monop
. For binary constructions that return a Boolean value, meant to be tested byif
tests, say, usebuild_expr_cond
instead (e.g,<=
or&&
). Other examples can be found in the grammar fileissdl.y
.- The resulting abstract syntax node should be stored into
$$
. Instead of writing$$ = build_expr_binop(compiler_ctx_g, OP_MUL, $1, $3)
, please use theRESULT
macro. This not only takes care of callingGC_TOUCH
, but will also push$$
onto a stack (ctx->protected
, wherectx
is the rule compiler context) that the garbage collector will go though to mark objects. This makes sure that none of these newly created nodes will be freed by the garbage collector. You may also useRESULT_DROP
, instead ofRESULT
, to first pop that stack to a predefined level, then push the result. A typical example is the grammar production:rule: rulekw SYMNAME synchro O_BRACE state states C_BRACE { RESULT_DROP($$,$1, build_rule(compiler_ctx_g, &($2), $5, $6, $3)); } ;
which first pops the
ctx->protected
stack to the level given by$1
, then pushes the result of build_rule. The right level is obtained by callingCOMPILE_GC_DEPTH
on recognizing the first token (RULE
):rulekw : RULE { $$ = COMPILE_GC_DEPTH(compiler_ctx_g); } ;
We did not do that for multiplication, because it would be impractical to obtain the value of
COMPILE_GC_DEPTH
before we even recognize the firstexpr
given as multiplicand. In that case, thectx->protected
stack will probably be slightly larger than needed, but that should not cause much of a problem.
Extending the virtual machine
That mostly means adding one bytecode in our example, namely OP_MUL
. Add it at the end of the list of macros of the form OP_
… in ovm.h
. Make sure that the total number of opcodes is updated. At the time of writing, the last opcode becomes number 36, so we must write:
#define OPCODE_NUM 37
Now you need to write the code for the OP_MUL
opcode:
static int ovm_mul(isn_param_t *param) { orchids_t *ctx = param->ctx; ovm_var_t *op1; ovm_var_t *op2; ovm_var_t *res = NULL; DebugLog(DF_OVM, DS_DEBUG, "OP_MUL\n"); op2 = (ovm_var_t *)STACK_ELT(ctx->ovm_stack, 1); op1 = (ovm_var_t *)STACK_ELT(ctx->ovm_stack, 2); param->ip += 1; if (!IS_NULL(op1) && !IS_NULL(op2)) res = issdl_mul(ctx->gc_ctx, op1, op2); STACK_DROP(ctx->ovm_stack, 2); PUSH_VALUE(ctx, res); return 0; }
As you can see, ovm_mul
takes a unique parameter of type isn_param_t *
as argument. This is a pointer to a struct that contains all the relevant information: the Orchids context param->ctx
(from which we obtain the virtual machine’s stack, and then the two values to multiply) and the program counter param->ip
, which we will have to increment.
Extending the rule compiler
We do not have to do anything here! Since we have already registered our new construction as building an abstract syntax tree through build_expr_binop
, the rule compiler already knows how to compile this with the opcode given (OP_MUL
).
We would be in an equally cosy situation with build_expr_monop
, or build_expr_cond
. Exploring the various other grammar rules, you will see that various other node forming constructs are provided, for various situations.
Extending the type-checker
You need to instruct Orchids about the type(s) of your new construct. Contrarily to primitives, there is no standardized way of doing that. The best advice I can tell you is: find an already existing primitive that looks similar to yours, and type it in a similar way.
In our example, since we have added a binary operator, we should look at the compute_stype_binop()
function in rule_compiler.c
. There is a switch
on the binary operator, and we add a new branch:
case OP_MUL: restype = stype_join (ltype, rtype); if (restype==&t_any) goto type_error; if (strcmp(restype->name, "int") && strcmp(restype->name, "uint") && strcmp(restype->name, "float")) goto type_error; set_type (ctx, myself, restype); return;
This reads as follows. We first call the stype_join
function to compute the least common super types of the types ltype
and rtype
of the two arguments. (A super type is a type that contains a superset of values. For example, db[int, uint]
is a super type of the type db[*]
, the latter containing only the empty database.) This makes sure that we know a common type that the two arguments inhabit.
Next, we compare the common type to one of the allowed types, int
, uint
, or float
. Indeed ovm_mul
calls issdl_mul
, which will call a function pointer, depending on the runtime type of the arguments, and is able to multiply signed and unsigned integers, and floating-point values.
Finally, if we have passed all tests, we call set_type()
to set the type of the current expression, myself
, to the common type. This does not really set the type of the expression, rather it pushes the task of doing so onto a queue, but that does not really matter.
Extending the static analyzer
We should also update the static analyzer. This means essentially the monotonicity checker.
In our example, since we have added a binary operator, we should look at the compute_monotony_bin()
function in rule_compiler.c
. There is a switch
on the binary operator, and we add a new branch:
case OP_MUL: if (ma==MONO_CONST && mb==MONO_CONST) res = MONO_CONST; break;
This means that OP_MUL
is not particularly monotonic. We are just saying that if both arguments are constant, then the result is constant.
That seems to be the absolute minimum, but that is not true. For example, a function that returns a new number each time it is called, or a construction that reads from a file, or writes to a file and returns an error code, should return MONO_UNKNOWN
, not MONO_CONST
.
On the opposite, monotonic constructions such as addition are handled as follows:
case OP_ADD: /* + is meant to be monotonic, whatever the types of the arguments */ /* MONO+MONO=MONO, ANTI+ANTI=ANTI */ if ((ma & MONO_MONO) && (mb & MONO_MONO)) res |= MONO_MONO; if ((ma & MONO_ANTI) && (mb & MONO_ANTI)) res |= MONO_ANTI; break;
knowing that res
is initialized to MONO_UNKNOWN
, that is, to 0
. Also, it may be that the two arguments satisfy the two if
tests. This happens when ma
and mb
are equal to MONO_CONST
, which is just MONO_MONO | MONO_ANTI
. In that case, res
will end up containing MONO_CONST
, which just says that the sum of two constant arguments is constant.
As Orchids develops, the number of static analyzers grows. How do you know which analyzers you should modify? The simplest recipe is as follows. Find a common byte code that already exists, say, OP_SUB, and take one that would be the most like the one you want to add. Then, look for all the places in the code where that is used, by typing grep OP_SUB src/*.[ch] src/modules/*.[ch]
, and add your own code after each of the relevant lines that you have just found.