Features

The features are the main focus of this format, and also the most complex part of it.

Base Features

We call base features those which are not obtained by combining other features. These are represented in this format by JSON strings.

We try to use the names of the IPFIX information elements defined by IANA. Additionally to the names defined by IANA, we also have some operations, by which we can get new features. For features that we can not get out of combining IANA features with our limited set of operations, we have two naming options:

  • if the feature is expected to be used many times (e.g.: there are some KDD ‘99 features which we cannot represent using IANA features and operations, but they are used in many papers), use a _ as prefix to a descriptive feature name
  • if the feature is very specific to this paper, use __ (double _) as prefix to a descriptive feature name

In both of this cases, try to give descriptive feature names, similar to the the ones used by IANA.

This means that all base features that do not start with _ have to be IPFIX information elements defined by IANA.

There is still another case, which is features that are repeated often, and are a combination of IANA features. In this case, use a descriptive feature name which starts with _ as an alias for it. A complete list of aliases is in ../dict.json; please add additional aliases there.

Operations

Below is a complete list of possible operations:


# <value> always outputs a single number (a <value>)
<value> -> {"mean": [<values>]}
<value> -> {"stdev": [<values>]}
<value> -> {"variance": [<values>]}
<value> -> {"median": [<values>]}
<value> -> {"quantile": [<values>, <value>]} # second argument is a number from 0 to 1, where 0 is the minimum and 1 the maximum
<value> -> {"minimum": [<values>]} | {"minimum": [<value>+]}
<value> -> {"maximum": [<values>]} | {"maximum": [<value>+]}
<value> -> {"argmin": [<values>]} | {"argmin": [<value>+]}
<value> -> {"argmax": [<values>]} | {"argmax": [<value>+]}
<value> -> {"floor": [<value>]}
<value> -> {"ceil": [<value>]}
<value> -> {"mode": [<values>]} # returns the most frequent element in <values>
<value> -> {"count": [<selection>]} | {"count": [<values>]}  # returns number of selected objects
<value> -> {"distinct": [<values>]}  # returns number of distinct values in <feature> in the selected objects
<value> -> {"apply": [<feature>, <selection>]}  # returns a single feature value for the selection of objects
<value> -> {"add": [<value>+]} | {"add": [<values>]}
<value> -> {"subtract": [<value>, <value>]}
<value> -> {"multiply": [<value>+]} | {"multiply": [<values>]}
<value> -> {"divide": [<value>, <value>]}
<value> -> {"log": [<value>]}
<value> -> {"exp": [<value>]}
<value> -> {"entropy": [<value>]}
<value> -> {"get": [<value>, <values>]} | {"get": [<value>, <value>]}  # gets the <value>-th element of the second argument (if the second argument is also <value>, the elements are bits)
<value> -> {"ifelse": [<logic>, <value>, <value>]}  # if the condition is true, return the first argument else the second
<value> -> {"get_previous": [<aggregation-feature>]}  # gets feature at time = t-1
<value> -> {"left_shift": [<value>, <value>]}  # shift the bits in the first value left by the second value
<value> -> {"right_shift": [<value>, <value>]}  # shift the bits in the first value right by the second value
<value> -> <free-integer> | <base-feature> | <free-float>

Value & Values

The value directive represents a single value, while the values directive represents a list of values. This is necessary to distinguish the arguments to the operations.

Selection & Logic

The selection directive is useful for filtering out packets or any other information which might not be interesting for a particular feature. Intuitively, using selection on a flow will select packets (that is, the result will be the packets that fulfill the conditions in the selection), and in a flow_aggregation will select flows.

Its syntax is the following:


# <selection> outputs a list of objects (packets, flows or aggregations, depending on what kind of feature is used)
<selection> -> {"select": [<logic>]}
<selection> -> {"select_slice": [<value>, <value>]} | {"select_slice": [<value>, <value>, <selection>]}  # selects a slice from the first value to the second value, with Python-like indexing (if a <selection is not provided, default to selecting everything)
<selection> -> "forward" | "backward"  # special cases for selection; select objects in the forward (or backward) direction
<selection> -> {"select_flows": [<logic>]}  # same as "select", but outputs flows; only valid when used in flow aggregations
<selection> -> {"select_slice_flows": [<value>, <value>]} | {"select_slice_flows": [<value>, <value>, <selection>]}  # same as "select_slice", but outputs flows; only valid when used in flow aggregations
<selection> -> "forward_flows" | "backward_flows"  # same as "forward"/"backward", but outputs flows; only valid when used in flow aggregations

The logic directive contains the test to decide what gets or not filtered. Definition of logic:


# <logic> is used for selection, should be evaluated for each object
<logic> -> {"and": [<logic>+]} 
<logic> -> {"or": [<logic>+]}
<logic> -> {"geq": [<feature>, <value>]}
<logic> -> {"leq": [<feature>, <value>]}
<logic> -> {"less": [<feature>, <value>]}
<logic> -> {"greater": [<feature>, <value>]}
<logic> -> {"equal": [<feature>, <value>]}
<logic> -> true | false

Feature Specification

The following is the specification for the features and feature directives:


<features> -> [<feature>+] | null
<feature> -> <value> | <base-feature>

The packet-feature, flow-feature and aggregation-feature are packet, flow and aggregation -level features (respectively), which are not compositions of other features/operations. That is, they should be strings from the IANA IPFIX information elements list, or strings that start with _ or __.

Example Features

The following are examples of the features directive.

"features": [
  "protocolIdentifier",
  "sourceTransportPort",
  "destinationTransportPort",
  "octetTotalCount",
  "packetTotalCount",
  "_activeForSeconds",
  {"divide": ["octetTotalCount", "_activeForSeconds"]},
  {"divide": ["packetTotalCount", "_activeForSeconds"]},
  "__maximumConsecutiveSeconds",
  "__minimumConsecutiveSeconds",
  {"maximum": ["_interPacketTimeMicroseconds"]},
  {"minimum": ["_interPacketTimeMicroseconds"]},
  {"count": [{"select": [{"geq": ["_interPacketTimeMicroseconds", 1000000]}]}]}
]
"features": [
  {"entropy": ["sourceIPv4Address"]},
  {"entropy": ["destinationIPv4Address"]},
  {"entropy": ["destinationTransportPort"]},
  {"entropy": ["_flowDurationSeconds"]},
  {"multiply": [{"argmax": [{"count": [{"select": [{"less": ["ipTotalLength", 128]}]}]}, {"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 128]}]}, {"select": [{"less": ["ipTotalLength", 256]}]}]}]}, {"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 256]}]}, {"select": [{"less": ["ipTotalLength", 512]}]}]}]}, {"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 512]}]}, {"select": [{"less": ["ipTotalLength", 1024]}]}]}]}, {"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 1024]}]}, {"select": [{"less": ["ipTotalLength", 1500]}]}]}]}]}, {"add": [{"entropy": [{"count": [{"select": [{"less": ["ipTotalLength", 128]}]}]}]}, {"entropy": [{"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 128]}]}, {"select": [{"less": ["ipTotalLength", 256]}]}]}]}]}, {"entropy": [{"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 256]}]}, {"select": [{"less": ["ipTotalLength", 512]}]}]}]}]}, {"entropy": [{"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 512]}]}, {"select": [{"less": ["ipTotalLength", 1024]}]}]}]}]}, {"entropy": [{"count": [{"and": [{"select": [{"geq": ["ipTotalLength", 1024]}]}, {"select": [{"less": ["ipTotalLength", 1500]}]}]}]}]}]}]},
  {"get": [14, "tcpControlBits"]}
]
"features": ["_KDD5", "_KDD23", "_KDD3", "_KDD6", "_KDD35", "_KDD1"]