In data networks, routers classify packets to determine the micro-flows the packets belong to and then apply the classification to the packets accordingly. Flow identification is the essential first step for providing any flow dependent service. A number of network services require packet classification including access-control, firewalls, policy-based routing, provision of integrated/differentiated qualities of service, traffic billing, secure tunnelling. In each application, the classifier determines which micro-flow an arriving packet belongs to so as to determine whether to forward or filter, where to forward it to, what class of service it should receive, the scheduling tag/state/parameter that it is associated with, or how much should be charged for transporting it. The classifier maintains a set of rules about packet headers for flow classification.
To clarify, a router is multi-port network device that can receive and transmit data packets from/to each port simultaneously. Data packets typically has regular format with uniform header structure. The header structure usually contain data fields such as address, packet type. When a packet is received from a port, the router uses the header information to determine whether a packet is discarded, logged, or forwarded. If a packet is forwarded, then the router also calculates which output port the packet will be going to. The router also accounts for the number of each type of packet passing by. The forwarding decision (where to send the packet) is typically made based on the destination address carried in the packet. In an Internet Protocol Router, forwarding involves a lookup process called the Longest Prefix Match (LPM) that is a special case of the general mask matching process.
The LPM uses a route table that maps a prefix rule (a mask-matching rule with all the wildcard bits located at the contiguous least significant bits) to an output port ID. An example of an LPM route table is given below:
#32-bit PrefixOutput PortID1xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxxport 021111 0010 1100 xxxx xxxx xxxx xxxx xxxxport 131101 0011 0001 xxxx xxxx xxxx xxxx xxxxport 341111 0010 1100 1100 0011 xxxx xxxx xxxxport 250010 0000 0001 1111 1111 0000 1101 0000port 4
An input packet with destination address=“1111 0010 1100 1100 0011 1111 1111 1111” should be forwarded to port 2 because it matches entry #3 and #4, but #4 has priority over #3 because the prefix length (number of non-wildcard bits) of #4 is longer than #3.
The router/firewalls examine the input packets to determine if they should be discarded and logged. This is usually done with an Access Control List (ACL) Lookup. An ACL can be a suer configurable mask-matching rule set (based on packet header fields) that categorizes certain types of traffic that may be hazardous to the network. Hence, when a packet that matches an ACL entry, the router/firewall should take action according to the ACL to discard/log the packet or alarm the network administrator.
Such devices as explained above use general multi-layer classification methods in carrying out the devices function. General multi-layer classification requires the examination of arbitrary data/control fields in multiple protocol layers. The grammatical/lexical parser provides flexible solutions to this problem, but the cost of supporting a large rule set is high.
A multiple field classifier is a simple form of classifier that relies on a number of fixed fields in a packet header. A classic example is the 7-dimensional classification, which examines the SA/DA/TOS/Protocol in the IP header, and the SPORT/DPORT/PROTOCOL_FLAG in the TCP/UDP header. Because a multi-field classifier deals with fixed fields, parsing is not required. Instead of dealing with variable length packets, the multi-field classifier does classification on fixed sized search keys. The search key is a data structure of the extracted packet data fields. The Multi-field classifier assumes the search keys are extracted from the packet before being presented to the classifier.
The problem of multiple field classification can be transformed into the problem of condition matching in multi-dimensional search key space, where each dimension represents one of the data fields the classifier needs to examine. A classification rule specifies conditions to be matched in all dimensions.
The classification rules specify value requirements on several fixed common data fields. Previous study shows that a majority of existing applications require up to 8 fields to be specified: source/destination Network-layer address (32-bit for Ipv4), source/destination Transport-layer port numbers (16-bit for TCP and UDP), Type-of-service (TOS) field (8-bits), Protocol field (8-bits), and Transport-Layer protocol flags (8-bits) with a total of 120 bits. The number of fields and total width of the fields may increase for future applications.
Rules can be represented in a number of ways including exact number match, prefix match, range match, and wildcard match. Wildcard match was chosen to be the only method of rule representation that did not sacrifice generality. Any other forms of matching are translated into one or multiple wildcard match rules. A wild card match rule is defined as a ternary string, where each bit can take one of three possible values: ‘1’, ‘0’, or ‘x’. A bit of ‘1’ or ‘0’ in the rule requires the matching search key bit in the corresponding position to have exactly the same value, and a bit of ‘x’ bit in the rule can match either ‘0’ or ‘1’ in the search key.
An example of a rule specification on a 16-bit field is given below:
The classifier wants to match   1111  0000  xx1x  0xx1The mask is:   1111  1111  0010  1001The target value is:   1111  0000  0010  0001
Prefix match rules can be represented in wildcard rules naturally by contiguous ‘x’ bits in the rules. However the don't-care bits in a general wildcard do not have to be contiguous. Ranges or multiple disjoint point values may be defined by using multiple masked matching rules. For example, an 8-bit range must be broken into two masked matching rules ‘00010xxx’ and ‘00110xx’. Even with this limitation, the masked matching form is still considered to be an efficient representation, because most of the ranges in use can be broken down into a small number of mask rules. A compiler can handle the task of breaking down user rule specification in a convenient syntax, therefore the complexity can be hidden from the user.
Each rule represents a region in the multi-dimensional space. Each search key (representing a packet to be classified) defines a point in this space. Points that fall into one region are classified as a member of the associated class. Ambiguity arises when multiple regions overlap each other. A single priority order is defined among the rules to resolve the ambiguity. The rules are numbered from 0 to N−1. The rule indices define the priority among the rules in ascending order. The region with higher priority will cover the region with lower priority. In other words, if a packet satisfies both rule[i] and rule[j], if i<j, it is classified into class[i], otherwise into class[j].
One advantage of mask matching is its dimension independence. Multiple fields concatenated can be classified with the same method as if they were one wide field. This is accomplished by concatenating the masks of the target strings.
The prior solutions can be grouped into the following categories:
Sequential Match
For each arriving packet, this approach evaluates each rule sequentially until a rule is found that matches all the fields of the search key. While this approach is simple and efficient in use of memory (memory size grows linearly as the size of the rule set increase), this approach is unsuitable for high-speed implementation. The time required to perform a lookup grows linearly with rule set size.
Grid of Tries
The ‘Grid of Tries’ (or Tuple Space Search) uses extension of tries data structure to support two fields per search key. This is a good solution for two-dimensional rule set. But it is not easy to extend the concept to more fields. The cross-producting scheme is an extension of the ‘Grid of Tries’ that requires linear search of the database to find the best matching filter. Hence the effectiveness of cross-producting is not clear. The grid of tries approach requires intensive precompute time. The rule update speed is slow.
A scheme based on tries is presented by Douceur et al. in U.S. Pat. No. 5,995,971 and U.S. Pat. No. 5,956,721. This method utilizes a tri-indexed hierarchy forest (“Rhizome”) that accommodates wildcards for retrieving, given a specific input key, a pattern stored in the forest that is identical to or subsumes the key. This approach has the weakness of not supporting “conflict” between patterns (as stated in line 21˜26, column 22 of U.S. Pat. No. 5,995,971). Patterns that partially overlap but do not subsume one another (E.g. pattern “100×” and “1×00”) are in “conflict” because they overlap each other partially, may not be stored in the rhizome defined by the invention, since no defined hierarchical relationship holds for these patterns. In networking applications, these conflicts widely exist in router access list and firewall policies. This weakness limits the use of the classification scheme.
Concurrent Cross Producting
T. V. Lakshman in “High Speed Policy-Based Packet Forwarding Using Efficient Multi-Dimensional range Matching”, Proceedings of ACM SIGCOMM' 98 Conference, September, 1998, presented a hardware mechanism for concurrent matching of multiple fields. For each dimensional matching this scheme does binary search on projections of regions on each dimensions to find the best match region. A bit-level parallelism scheme is used to solve the crossproducting problem among dimensions. The memory size required by this scheme grows quadratically and memory bandwidth grows linearly with the size of the rule set. Because of the computation complexity in the cross-producting operating, this scheme has a poor scaling property. This scheme also requires a time consuming data structure generation process, hence the rule update speed is slow.
Ternary CAM
Hardware Ternary CAMs (Content Addressed Memory) can be used for classification. Ternary CAMs store three value digits: ‘0’, ‘1’ or ‘X’ (wildcard). The CAMs have good look-up performance, fast rule update time. But the hardware cost (silicon area) and power consumption are high. More over, the CAMs require full-custom physical design that prevents easy migration between different IC technologies. For these reasons, current available CAMs are typically small.
Recursive Flow Classification
The recursive flow classifier (RFC) as discussed in Pankaj Gupta and Nick Mckeown, “Packet Classification on Multiple Fields”, Sigcomm, September 1999, Harvard University and Pankaj Gupta and Nick Mckeown, “Packet Classification using Hierarchal Intelligent Cuttings”, Proc. Hot Interconnects VII, August 99, Stanford, exploits the heuristics in typical router policy database structure(router microflow classifier, access list, fireware rules). RFC uses multiple reduction phases; each step consisting of a set of parallel memory lookups. Each lookup is a reduction in the sense that the value returned by the memory lookup is shorter (is expressed in fewer bits) than the index of the memory access. The algorithm can support very high lookup speed at relatively low memory bandwidth requirement. Since it relies on the policy database structure, in the worst case, little reduction can be achieve at each step. Hence the performance becomes indeterministic. In a normal case, the lookup performance gain is achieved at the cost of high memory size and very long precomputation time. For a large ruleset (16K), the RFC precompute time exceeds the practical limit of a few seconds. In general, RFC is suitable for small classifiers with static rules.
Based on the above, any solution developed must therefore have a number of features. The first is that of a fast look up speed. Since look up speed is determined by the number of steps (or hardwired clock cycles) required to perform each look up, any solution must have either a small constant value for its number of steps or this must be bounded by a small constant value. The second requirement is that any solution must be capable of supporting a large mask-matching rule set size. The size of the rule set must be capable of expansion at linear hardware/memory cost without sacrificing throughput.
A third requirement is of an expandable field width. A solution must allow for variable width of a search key or must support varied number of dimensions (number of fields) at linear cost.
A fourth requirement is that in implementing a solution no external memory should be required. many classification methods consume either a huge amount of memory storage and/or memory bandwidth. Both external memory storage and memory bandwidth are expansive factors that must be considered in chip architecture. These factors directly affect chip feasibility if either factor is too large. A solution should limit the rule set data structure in on-chip memory to prevent off-chip memory accesses.
A solution should also allow for fast rule updates. it should have a linear pre-compute time for rule set data structures random (non-intrusive) access to rule set image to allow for fast incremental insertion, deletion, and modification of individual rules. Some classification methods require a very long pre-processing time and/or intrusive reload of rule image data structure for a small change in the rule set. The long update/pre-processing time prevents such methods from being effective for microflow classification for provision of differentiated/integrated QOS (Quality of Service) where the rule sets are often session dependent or require frequent updates with low latency tolerance.
Ideally, a solution should not rely on the heuristics by how rules are structured. It should give deterministic performance and cost for an arbitrary rule set of certain size. This allows such a solution to be used to support existing and future applications where rule structures are not well understood.