Aggregations, Flows & Packets

  • The packets directive serves the purpose of representing feature-vectors which represent packets.
  • The flows directive allows representing aggregations of packets, according to a specific key.
  • The flow-aggregations directive is used for representing aggregations of flows, according to a specific key.

In short, packets are to flows as flows are to aggregations.

All of these directive refer to lists of packet/flow/flow-aggregation (respectively). This is because papers frequently try different feature vectors for different goals, and/or for comparison among them. This way, we can keep the information of the multiple feature-vectors, without having multiple different specifications for the same paper.

Each packet/flow/flow-aggregation contains a list of features (directive features), a list of free-text goals (directive goals), in which you can write the problem the authors were addressing, and a free-text tool (directive tool), in which you should put the tool that was used to extract the features (e.g.: tshark, yaf, etc). The flow and flow-aggregation directives contain additionally a specification of the key used (directive key), and a time window (directive window), in seconds.


The key directive contains itself some more fields:

<key> -> {
  "bidirectional": <bidirectional>, 
  "key_features": <features>
} | null

The key_features directive indicates the features used for a flow/flow-aggregation. If you do not know the features being used as key, use null or leave empty. If there is no key (as in, everything is aggregated together), use an empty list ([]).

The bidirectional directive indicates whether a flow is unidirectional (only has packets with the exact same key as in key_features), bidirectional (has packets with the same key as in key_features, and packets in the opposite direction) or “separate_directions” (has packets as if it was bidirectional, but the features in the features directive are evaluated twice, once for each direction; i.e. if you have octetTotalCount in the features list, and key has “separate_directions”, you will get two features, one with the octetTotalCount in the packets in one direction, and another in the opposite direction).

Definition of bidirectional:

<bidirectional> -> true | false | null | "separate_directions"  # "separate_directions" in the case where the key is bidirectional and each feature appears twice, one for each direction

The following is an example of the very common unidirectional 5-tuple key:

"key": {
  "bidirectional": false,
  "key_features": [

Traffic Type

The traffic_type directive is to be used when only traffic of a certain type is used. Its definition follows:

<traffic_type> [<traffic_types>+] | <traffic_types>
<traffic_types> -> "ip" | "tcp" | "udp" | "icmp" | "dns" | "http" | null


Definition of packet:

<packets> -> [<packet>+] | null
<packet> -> {
  "features": <features>, 
  "goals": <goals>, 
  "tool": <tool>, 
  "traffic_type": <traffic_type>

Definition of flow:

<flows> -> [<flow>+] | null
<flow> -> {
  "features": <features>, 
  "goals": <goals>, 
  "key": <key>, 
  "tool": <tool>, 
  "window": <window>,
  "traffic_type": <traffic_type>

Definition of flow-aggregation:

# flow-aggregations -- features are extracted from sets of flows
<flow-aggregations> -> [<flow-aggregation>+] | null
<flow-aggregation> -> {
  "flow": <flow>,
  "features": <features>, 
  "goals": <goals>, 
  "key": <key>, 
  "tool": <tool>, 
  "window": <window>,
  "traffic_type": <traffic_type>