Packet Classification is a functionality required by networking devices in a wide range of contexts like Quality of Service (QoS), load balancing, security, monitoring and network traffic analysis. However, the average link speed is constantly increasing, while classification scheme performance tends to increase at a slower pace than that of the physical links. Thereby, classification remains a hot research topic.
Packet classification aims at matching incoming packets with one or multiples rules, contained in a rule set. Packet classification techniques known in the art are mainly linked to the classical 5-field context. Nonetheless, due to the significance of data centers and resources management, a global view of the system is required, from network equipment to servers. For that purpose, a solution called Software Defined Networking (SDN) has been proposed.
SDN is the next evolution in the networking field, as it uses a small processing granularity, and allows, for instance, to optimize the link utilization rate, get a unified view of the network fabric, improved failure handling, etc. Such improvements are drastically changing the shape of networking. Thereby, packet classification is deeply changed, and has to handle much more complex rules over a high number of fields. Moreover, the 5-tuple context doesn't match the trends and evolution in the networking field anymore. Software Defined Networking and more specifically, the OpenFlow protocol takes more and more importance in the literature, mainly due to its high degree of flexibility. By opposition to the classical 5-tuple context, SDN rule sets, with large sized flow entries are much more complex, due to the higher number of fields that can be used to classify a packet, and the ability to use masks on many fields. For instance, in version v1.0.0 of OpenFlow, up to 12 fields of a packet header can be used to classify a packet (See Table I).
TABLE IOpen Flow fields used in V1.0.0FieldBitsMaskIngress Port6NoEthernet destination MAC address48YesEthernet source MAC address48YesEthernet type16NoVLAN-ID from 802.1Q header12NoVLAN-PCP from 802.1Q header3NoIP source address32YesIP destination address32YesIP ToS bits6NoIP protocol8NoTransport source port/ICMP Type16NoTransport destination port/ICMP code16No
Furthermore, when using Internet Protocol version 6 (IPv6) or Media Access Control (MAC) addresses as done in the latest evolution of SDN protocols, the rule size increases substantially. SDN evolution considers, on one hand, bigger fields and, on the other hand, a larger number of fields. Therefore, rules tend to be much more complex than was considered in the classical 5-tuple context. Such a context evolution has an impact on packet classification performance.
Although the number of functionalities offered to end users increases, limited progress has been achieved at the algorithm level. From an industrial point of view, Ternary Content Addressable Memory (TCAM) based solutions are widely used, while having many drawbacks, such as lack of flexibility and high power consumption. According to the Open Networking Foundation, the latest versions of protocols such as OpenFlow, used in the case of SDN, require support from powerful TCAM-like tables, but with more capability than available and announced hardware implementations. We are clearly facing a bottleneck by offering the end user a really high degree of flexibility without any optimized hardware available.
Many approaches have been considered in the literature to tackle the problem of packet classification, but many algorithms appear to under-perform or are not tailored for handling complex rules. We can categorize packet classification techniques in three main types: Decomposition Based, Decision Tree based, and pure hardware solutions, known as TCAM.
TCAM is a powerful memory, hardware, that offers O(1) time packet classification. To achieve such a high performance, TCAMs match, in parallel, each rule against the incoming packet header. TCAMs offer high performance but suffer from several drawbacks. Parallel match is extremely power consuming and TCAM chips are very costly. Further, supporting range based rules remains an open issue. Such bottlenecks tend to limit the use of TCAMs in current and future networking contexts.
One approach adopted in the literature to classify packets, named Decomposition Based, aims at separating the lookup process into multiple parallel reduced lookups, and then combines the results together. Some algorithms known in the art use this technique. Those algorithms can achieve good performance but suffer from a large memory requirement. Decomposition Based algorithms are not scalable, due mainly to the mentioned memory drawback and, consequently, are inappropriate to handle large classification tables.
Decision tree based algorithms are another avenue explored in literature to address the packet classification issue. Many known algorithms are using this approach, such as HiCuts, HyperCuts, and EffiCuts, which is a state of the art algorithm. Each of those algorithms divides the rule space (i.e. the rule-set) into subsets in an iterative fashion, until each subset contains fewer rules than a given threshold. An example of a tree building is shown in FIG. 1, using the HiCuts algorithm. The first step is to cut the rule space along the dimension which maximizes the differentiation between rules. So, a first cutting sequence, represented with the vertical lines, is done along the Field 1 direction, which generates four nodes. Three of those nodes contain fewer rules than the threshold value, set here to 2; those nodes correspond to leaves 1 to 3. Node 1 stores three rules, so another cutting sequence has to be completed, and is represented with a horizontal line. This process creates two more leaves 4 and 5. The decision tree is then finished.
The classification process is simply a tree traversal, from root node to leaves. The incoming packet header is compared with the rule space covered by each node, and then the position of the next child node to visit is computed based on information contained in each node. When a leaf is reached, each rule is matched against the packet header, and the matching rules are then selected. The process of packet classification is completed, and a new packet can be processed.
The first proposed tree-based packet-classification algorithm, HiCuts (in FIG. 1), generates a lot of rules replication as it creates a single decision tree, thus mixing together rules with very significant differences in size (which causes a lot of superposition).
HyperCuts was proposed as an evolution to HiCuts, with an aim at improving the convergence rate of the classification (thus, minimizing tree depth) while limiting the data structure size. To achieve this, the algorithm is based on multidimensional cuts and it includes techniques to minimize replication. These techniques produce better performance in terms of number of memory accesses, but scalability is poor.
EffiCuts aims mainly at striking the best compromise between the average number of memory accesses and the data set size, for the 5-tuple context. EffiCuts aims at reducing the overlap between rules in a classification table and reducing the variation in size of rule overlap, which leads to a high degree of rule replication caused by thin cuts. EffiCuts addresses this issue by partitioning the rule set and binning rules with different size patterns in different subsets. Each of these subsets is then associated with a dedicated decision tree. This method is called separable trees. However, the introduction of multiple trees adds extra memory accesses, which decrease throughput. This problem is solved in EffiCuts with selective tree merging. This method aims at merging selectively separable trees, mixing rules that are either small or large in at most one dimension.
Whereas HiCuts and HyperCuts cut the space covered by a node equally between each child node, EffiCuts introduces equi-dense cuts, in order to tackle the problem of ineffective nodes containing only a few rules occurring when separating dense parts and empty zones.
Additionally, EffiCuts introduces other optimization techniques like node collocation. Node collocation was proposed in order to reduce the overall number of memory accesses. It thus reduces considerably the memory usage compared to HyperCuts while having a low replication factor. On the other hand, these optimizations tend to increase the average number of memory accesses.
When there is more than one tree created by the decision tree based algorithm, a packet is classified by traversing each decision tree. In each tree, the process begins from the root node and this process is repeated until a leaf node is reached. Then, the packet is compared with every rule held in the leaf node.
The HyperCuts algorithm has been successfully implemented in Field-Programmable Gate Array (FPGA). While HyperCuts suffers from a high replication factor, optimizations can be included to tackle this issue and to address hardware tree traversal issues. One implementation known in the art can process up to 80 Gbps of bandwidth, for minimal packet size of 40 bytes, while using 5-tuple classification tables. However, a study evaluating its scaling properties conducted on OpenFlow-like rules (V1.0.0) concluded that it does not scale well when dealing with OpenFlow-like rules.
Other FPGA implementations use algorithms such as Hyper-Split or ParaSplit over HyperSplit with performances which can reach 123 Gbps for minimal packet size (64 bytes).
An alternative to FPGA implementation is to use an array of microprocessors. One implementation known in the art implemented EffiCuts on the Pipelined Look Up Grid (PLUG) platform. PLUG is a flexible lookup module platform designed to easily deploy new protocols in high-speed routers. Multiple modifications to implement EffiCuts have been added to the PLUG platform on both the hardware and the software side. Even then, this implementation can only support 33 Gbps of data bandwidth for minimal packet size.
The decision tree based algorithms described here mainly focus on decreasing the replication factor and accelerating the convergence to leaves. Optimizations are proposed on at least two fronts: before tree building and when generating tree nodes.
Decision tree based algorithms are implementable in hardware and offer decent performance in the classical 5-tuple context. In some cases, as shown above, some exploration was conducted with OpenFlow-like rule, but no optimizations were proposed, nor recommendations made, nor deep analysis performed.