Dissection modules

The purpose of dissection modules is to parse data into Orchids event fields.

Henceforth, dissecting will mean the same thing as parsing.

Let us take an example.  Imagine you want to give the contents of file /var/log/messages as (polled) input to Orchids. This is a file in syslog format, and you would like Orchids to be able to parse this format. We write:

INPUT		textfile	"/var/log/messages"
DISSECT syslog	textfile	"/var/log/messages"

in the orchids-inputs.conf configuration file. The first line will tell Orchids
to use the textfile input module to read data from file /var/log/messages. On reading, say, the following line from /var/log/messages (say, line 1065):

Apr 25 23:22:40 laramie sendmail[102]: NOQUEUE: SYSERR(root): /etc/sendmail.cf: line 0: cannot open: No such file or directory

the textfile module alone would produce the event:

Field Contents
.textfile.line_num 1065
.textfile.file "/var/log/messages"
.textfile.line "Apr 25 23:22:40 laramie sendmail[102]: NOQUEUE: SYSERR(root): /etc/sendmail.cf: line 0: cannot open: No such file or directory"

The second line of our configuration file, DISSECT syslog textfile "/var/log/messages", will tell Orchids that for all input lines coming from the same source (/var/log/messages) from the same module (textfile), the syslog dissection module should be used to parse the .textfile.line string.

With the aformentioned DISSECT directive, the syslog module will add further fields, resulting in the following event, which is then fed to the Orchids engine:

Field Contents
.textfile.line_num 1065
.textfile.file "/var/log/messages"
.textfile.line "Apr 25 23:22:40 laramie sendmail[102]: NOQUEUE: SYSERR(root): /etc/sendmail.cf: line 0: cannot open: No such file or directory"
.syslog.time April 25, 23:22:40            (as a value of type ctime, not str)
.syslog.host "laramie"
.syslog.pid 102
.syslog.prog "sendmail"
.syslog.msg "NOQUEUE: SYSERR(root): /etc/sendmail.cf: line 0: cannot open: No such file or directory"

How it works

At configuration time, the DISSECT directive uses its third argument (here, the string "/var/log/messages") as a tag. There may be several DISSECT directives associated to the same pair of modules (here, textfile and  syslog), provided they have different tags.  This is so that the same dissection module can dissect several different sources of events.

Remember from the input modules page that the last and next-to-last fields in Orchids events play a special role:

  • The next-to-last field is used as an index.
  • The last field is the contents to be parsed.

In our example above, the next-to-last field produced by the textfile module is the .textfile.file field.  When its value (here, "/var/log/messages") matches the given tag, the corresponding dissection module is called, and will parse the last field; here, the .textfile.line field.

Here are a few noteworthy features of dissection modules:

  • Dissection modules concatenate their own list of fields to the input event.  Therefore, the fields of the original input event (such as .textfile.file above) remain available, if needed.
  • Dissection modules can dissect Orchids events output by any module, not just input modules, provided that it makes sense.
    For example, Orchids events produced by the syslog module can themselves be (re)dissected, say by the generic module. Note that care was taken that Orchids events produced by the syslog module such as in the above example both have a next-to-last field, serving as a dissection tag (here, of value "sendmail"), and a last field, which would be parsed further (the .syslog.msg field ).
    This allows one to cascade dissection modules.  This is described in more detail in the next section of this page.
  • As we have said earlier, the same dissection module can be used to dissect data, with the same format, coming from different sources.  Therefore, one can for example write the following in the orchids-inputs.conf configuration file:
    # Syslog events
    INPUT		textfile	"/var/log/messages"
    DISSECT syslog	textfile	"/var/log/messages"
    INPUT		textfile	"/var/log/auth.log"
    DISSECT syslog	textfile	"/var/log/auth.log"
    ## (standard syslog udp)
    INPUT			        udp	514
    DISSECT		bintotext	udp	514
    DISSECT		syslog	bintotext	514
    

    This example also displays cascading (last three lines), for syslog data transmitted over a udp connection.

Advanced cascading

We have already seen cascading.  For example:

INPUT			        udp	514
DISSECT		bintotext	udp	514
DISSECT		syslog	bintotext	514

tells Orchids to receive some data with the udp module with tag 514 (serving as port name).  Whatever is received this way is fed to the bintotext module, which cuts up raw packets into text lines, then to the syslog module for parsing.

We can continue this way: the last field of an event parsed by the syslog module is .syslog.msg, which can be parsed further, for example by the generic module or the json module.  However, it would be a mistake to write the following:

INPUT			        udp	514
DISSECT		bintotext	udp	514
DISSECT		syslog	bintotext	514
DISSECT         generic syslog 514

The reason is that 514 will no longer be the correct dissection tag. Recall that syslog will add its own fields to the event. Hence the new tag, provided by syslog, is the contents of its next-to-last field, .syslog.prog.

This is done on purpose.  It allows you to connect a new dissector based on the value of the program reporting an event through syslog.  For example, we may write:

INPUT			        udp	514
DISSECT		bintotext	udp	514
DISSECT		syslog	bintotext	514
DISSECT         my_sendmail_dissection_module syslog "sendmail"

for some hypothetical my_sendmail_dissection_module module, meant to parse sendmail messages further.

The generic module does it differently, though: you just need to write

INPUT			        udp	514
DISSECT		bintotext	udp	514
DISSECT		syslog	bintotext	514

and the generic will actually plug itself onto all syslog produced events, automatically.  What it does is described in the generic module’s configuration file.

(That is not fixed in stone. In principle, all plumbing should be visible in the orchids-inputs.conf file, and the way the generic module does it contradicts this principle.  Hence you may expect the ways of the generic module to change.)