Converting input into events

Orchids takes input from various data sources, and breaks them down into events.  The purpose of this post is to explain what it means, how it works, and how one should write event-generating modules.

Building events

Consider the following example. The mod_textfile module reads data from text files (and also Unix sockets, and pipes), and produces one event per text line. An Orchids event is just a list of pairs, consisting of a field and its value. The type of Orchids events is:

typedef struct event_s event_t;
struct event_s
{
  gc_header_t gc;
  int32_t    field_id;
  ovm_var_t *value;
  event_t   *next;
};

Formally, an object of type event_t is a pair consisting of a field (equated with its field id field_id) and its value value, plus a pointer next to subsequent pairs. The NULL pointer serves as the end of the list.

Imagine the mod_textfile module reads the following line of text from file
/Users/goubault/Desktop/Code/ORCHIDS/orchids_new_version/tests/macosx/authd.log, as its first line:

May  9 10:37:07 MacBook-Pro-de-Jean.local com.apple.authd[36]: Succeeded authorizing right 'system.login.console' by client '/System/Library/CoreServices/loginwindow.app' [55916] for authorization created by '/System/Library/CoreServices/loginwindow.app' [55916] (3,0)\n

(Although this is anecdotical, note the final newline \n.) The mod_textfile will then package that as an event which, in Orchids syntax, would be written:

.{.textfile.file = "/Users/goubault/Desktop/Code/ORCHIDS/orchids_new_version/tests/macosx/authd.log",
  .textfile.line_num = 1,
  .textfile.line = "May  9 10:37:07 MacBook-Pro-de-Jean.local com.apple.authd[36]: Succeeded authorizing right 'system.login.console' by client '/System/Library/CoreServices/loginwindow.app' [55916] for authorization created by '/System/Library/CoreServices/loginwindow.app' [55916] (3,0)\n"
 }

This is an event with three fields. Inside Orchids, this is an object of type event_t. Let base be the first field id attributed to the mod_textfile module by the Orchids module configuration mechanism. (While I’m writing this, I am running a copy of Orchids, and in my case, base equals 4, but that may, and will, vary.) In writing mod_textfile, we have decided to number our fields as follows:

#define TF_FIELDS  3   /* number of fields known to mod_textfile */
#define F_LINE_NUM 0        /* the .textfile.line_num field */
#define F_FILE     1        /* the .textfile.file field */
#define F_LINE     2        /* the .textfile.line field */

I’ll tell you later how the connection with field names is established.  For now, my point is that the mod_textfile module will create an event_t object:

  • with field_id equal to base+F_LINE_NUMvalue equal to 1 (packaged as an ovm_int_t), and next equal to a pointer  to…
  • another event_t object, with field_id equal to base+F_FILEvalue equal to
    "/Users/goubault/Desktop/Code/ORCHIDS/orchids_new_version/tests/macosx/authd.log" (packaged as an ovm_str_t), and next equal to a pointer  to…
  • another event_t object, with field_id equal to base+F_LINEvalue equal to
    "May  9 10:37:07 MacBook-Pro-de-Jean.local com.apple.authd[36]: Succeeded authorizing right 'system.login.console' by client '/System/Library/CoreServices/loginwindow.app' [55916] for authorization created by '/System/Library/CoreServices/loginwindow.app' [55916] (3,0)\n" (packaged as an ovm_str_t), and next equal to NULL.

The preferred way of doing so is as follows.  First, we allocate space for the maximum number of fields, plus one slot to hold the final value of the event.

 GC_START(gc_ctx, TF_FIELDS+1);

Here gc_ctx is a pointer to our GC context. GC_START() is usually meant to allocate space on the stack that is known to the garbage-collector, and we’ll use it also to store the values we are interested in.

We now fill in this array with values.  For example, assuming the current line number is in tf->line, we write:

  val = ovm_int_new (gc_ctx, tf->line);
  GC_UPDATE(gc_ctx, F_LINE_NUM, val);

This packages the line number as an Orchids object val, of type ovm_int_t (or rather, its super type ovm_var_t), and then stores that object inside the array.  Note that F_LINE_NUM is used as an index into that array, and will also be used to compute the field id (by adding base to it).

We do the same thing for all other fields—or only for some of them: if you don’t store anything at index F_FILE, for example, what we are in the process of describing will just build an event from which the .textfile.file field absent, that is all.

When all the fields we are interested in have been set this way, it only remains to call:

 REGISTER_EVENTS(ctx, mod, TF_FIELDS, 0);
 GC_END(gc_ctx);

The REGISTER_EVENTS() macro takes the Orchids context ctx, the current module mod, the number of fields, and the dissector level (here, 0; there is a beginning of an explanation on the latter here).  We use GC_END() to free the array allocated by GC_END(), see this post.

Registering events

The REGISTER_EVENTS() macro expands to a call to two functions in the Orchids API: add_fields_to_event(), and post_event().

The first of these functions:

void add_fields_to_event(orchids_t *ctx, mod_entry_t *mod,
                         event_t **event, ovm_var_t **tbl_event, size_t sz);

takes an array tbl_event of sz Orchids values and adds them to the front of the event (=a list of pairs) *event. This is used by the REGISTER_EVENTS(ctx, mod, nevents, dissection_level) macro, which calls add_fields_to_event (ctx, mod, (event_t **)&GC_LOOKUP(nevents), (ovm_var_t **)GC_DATA(), nevents): the array is the one allocated by GC_START(), namely GC_DATA(), and contains nevents field-value pairs; the event itself is stored in the remaining slot of the array (remember it contains nevents+1 fields; in our example, nevents=TF_FIELDS), namely &GC_LOOKUP(nevents).

One can also use add_fields_to_event() directly, and some modules do it.  A refined function is:

void add_fields_to_event_stride(orchids_t *ctx, mod_entry_t *mod,
                                event_t **event, ovm_var_t **tbl_event,
                                size_t from, size_t to);

Instead of sweeping through the whole tbl_event table, it only looks at the entries numbered from, from+1, …, to-1. The purpose is efficiency. Some modules have a high number of field ids (604 for mod_openbsm), but most events will only use a much smaller number of fields. Imagine you create an event containing field ids 230, 231, and 232. Calling add_fields_to_event() will produce a three field-value pair event, but to do so, it will sweep through the 604 possible entries in the tbl_event table. Instead, call add_fields_to_event_stride(ctx, mod, event, &tbl_event[230], 230, 233): this will only look at three entries, which is must faster. (Pay attention that the table argument should be &tbl_event[230], not tbl_event. You will realize that it is more natural in the long run.  Also, to is equal to 233, not 232.)

The second of these functions:

void post_event(orchids_t *ctx, mod_entry_t *sender, event_t *event,
                     int dissection_level);

posts the event we have just built to the Orchids engine. This function will do one thing among the following two:

  • If the current module (namely, the sender argument to post_event()) has a dissector, then it will call that dissector.  This will be the subject of another post, suffice it to say that that dissector will take the second field-value pair of the event we have just generated (here, the .textfile.line entry), break it into further fields, which it will add the the current event (the event argument) before it calls post_event() again.  This is done just as above.
  • If the current module does not have a dissector, it will inject the event (the event argument) into the Orchids engine, by calling inject_event():
    void inject_event(orchids_t *ctx, event_t *event);

    In turn, this will launch new Orchids threads, try to advance old ones, executing Orchids rules… the whole Orchids engine will start up for real. This will be explained in another post.

Field names and field ids

We haven’t yet explained how Orchids connected field names with field ids.  Still taking the example of the mod_textfile module, we define the following table:

static field_t tf_fields[] = {
  { "textfile.line_num", &t_uint, MONO_MONO,  "line number"                },
  { "textfile.file",     &t_str, MONO_UNKNOWN, "source file name"            },
  { "textfile.line",     &t_str, MONO_UNKNOWN,  "current line" }
};

This declares all the mod_textfile fields that we would like Orchids to know about.   The numbering of fields is implicit: for example, .textfile.line_num will necessarily be field number 0 in this module.  (Hence its field id will be base+0, where base is the first field id attributed to the mod_textfile module by the Orchids module configuration mechanism.)  This is why we defined F_LINE_NUM as 0, F_FILE as 1, and F_LINE as 2.  The field numbers and the field table should always be modified conjointly!  If you add a field, you must add it to the field table (here, tf_fields[]), and add its field number (to mod_textfile.h) at the same time.

Each field comes with additional typing information (e.g., &t_uint declares the field as an unsigned integer), monotonicity information (MONO_MONO declares the field as increasing through time, MONO_ANTI declares it as decreasing through time, MONO_CONST declares it as constant, and MONO_UNKNOWN in all other cases), and a short description string.

We use the field table to inform Orchids of the connection by calling register_fields() in the module’s pre-configuration function textfile_preconfig(), e.g.:

 register_fields(ctx, mod, tf_fields, TF_FIELDS);