The invention presented is motivated by the twin goals of increasing the capacity and the flexibility of the Internet. The Internet is comprised of packet-processing nodes, called routers, that route packets towards their destinations, and physical links that transport packets from one router to another. Owing to advances in optical technologies, such as Wavelength Division Multiplexing, the data rates of links have increased rapidly over the years. However, routers have failed to keep up with this pace because they must perform expensive per-packet processing operations. Every router is required to perform a forwarding decision on an incoming packet to determine the packet's next-hop router. This is achieved by looking up the destination address of the incoming packet in a forwarding table. Besides increased packet arrival rates because of higher speed links, the complexity of the forwarding lookup mechanism and the large size of forwarding tables have made routing lookups a bottleneck in the routers. The invention attempts to overcome this bottleneck.
The invention concerns itself with increasing the flexibility and functionality of the Internet. Traditionally, the Internet provides only a “best-effort” service, treating all packets going to the same destination identically, and servicing them in a first-come-first-served manner. However, in differentiated service models, Internet Service Providers are seeking ways to provide differentiated or value-added services (on the same network infrastructure) to different users and/or user applications based on their different requirements and expectations of quality from the Internet, i.e. Service Level Agreements (SLAs). For this, routers need to have the capability to distinguish and isolate traffic belonging to different users and user flows, where a user can be a single individual or a group of individuals with common denominator (e.g., all people in a company), and user flow can be the data associated with one or a group of applications with a common denominator (e.g., voice, web browsing, e-mail) of a user. The ability to classify each incoming packet to determine the flow it belongs to is called packet classification, and could be based on an arbitrary number of fields, i.e. classification fields, in the packet header.
As mentioned, routers may optionally classify packets into flows for special processing. In the following, it is described why some routers are flow-aware, and how they use packet classification to recognize flows. It is also provided a brief overview of the architecture of flow-aware routers. Then, the background leading to the formal definition of the packet classification problem is discussed.
One main reason for the existence of flow-aware routers stems from an ISP's (ISP=Internet Service Provider) desire to have the capability of providing value-added services to its users. As mentioned, the Internet provides only a “best-effort” service, treating all packets at every forwarding point in the network identically, and servicing them in a first-come-first-served manner. However, the rapid growth of the Internet has caused increasing congestion and packet loss at intermediate routers. As a result, some users are willing to pay a premium price in return for better service for all or a group of applications from the network. To maximize their revenue, the ISPs also wish to provide different levels of service at different prices to users based on their requirements, while still deploying one common network infrastructure. In order to provide differentiated services, routers require additional mechanisms. These optional mechanisms—admission control, conditioning (metering, marking, shaping, and policing), resource reservation, queue management and fair scheduling (such as weighted fair queueing) or any other mechanism deemed suitable in any combination of a set or subset of these—require, first of all, the capability to distinguish and isolate traffic belonging to different user(groups) and/or applications based on service agreements negotiated between the ISP and its customer. This has led to demand for flow-aware routers that negotiate these service agreements, express them in terms of rules or policies configured on incoming packets, and isolate incoming traffic according to these rules. The functionality that specifies the policy that applies to a packet (e.g., to which flow a packet belongs) is a packet classifier (flow classifier) or simply classifier. The collection of policies is the network policy. Once classified, the policy (action to be taken) is executed. So a policy consists of a definition part (policy definition, implemented in the classifier) and an action (policy action). Each policy specifies a flow that a packet may belong to based on some criteria on the contents of the packet. This does not have to be limited to the header. E.g., for firewall functionality the system administrator also wants to look into the user data to check on the existence of viruses (typical user data). All packets belonging to the same flow are treated in a similar manner. The identified flow of an incoming packet specifies an action to be applied to the packet. For example, a firewall router may carry out the action of either denying or allowing access to a protected network. The determination of this action is called packet classification—the capability of routers to identify the action associated with the “best” policy an incoming packet matches. Packet classification allows ISPs to differentiate from their competition and gain additional revenue by providing different value-added services to different customers.
A flow-aware router is able to check for every incoming packet if it belongs to a flow for which the action is already determined. This is done by checking a bit pattern of a predetermined number (Nfld) of classification fields (in IPv6: flow label or any field or combination thereof). The router checks if the bit pattern is present in a so-called flow table (this can be done via e.g. hashing), and if so, executes the actions specified for that flow. If the bit pattern is not found in the flow table, normal classification occurs, and optionally the packet bit pattern may be put in the flow table, together with the policy action to be applied for this flow.
Packet classification enables a number of additional, non-best-effort network services other than the provisioning of value-added services. One of the well-known applications of packet classification is a firewall. Other network services that require packet classification include policy-based routing, traffic rate-limiting and policing, traffic shaping, and billing. In each case, it is necessary to determine which flow an arriving packet belongs to so as to determine—for example—whether to forward or filter it, where to forward it to, what type of service it should receive, or how much should be charged for transporting it. With the introduction of QoS (Quality of Service) in networks, classification of IP-packets in access routers has become more important than ever. In the differentiated service model, the value for the differentiated service code point (DSCP) in IP packets is based on the classification of the flow in the access points. Similarly, in a Multi-Protocol Label-Switched (MPLS) domain the packet flows have to be assigned to a specific Label-Switched Path (LSP) at the access point. Hence efficient classification methods are of high importance to router vendors.
With the above background, the problem of packet classification can be described:
In practice, a policy may have several components, wherein a policy component is not a general regular expression—often limited by syntax to a simple address/mask or operator/number(s) specification. In an address/mask specification, a “0” at bit position x in the mask denotes that the corresponding bit in the address is a “don't care” bit. Similarly, a “1” at bit position x in the mask denotes that the corresponding bit in the address is a significant bit. For instance, the first and third most significant bytes in a packet field matching the specification 171.4.3.4/255.0.255.0 must be equal to 171 and 3, respectively, while the second and fourth bytes can have any value, due to the fact that the mask bits of the first and third mask bytes are all set to “1” (i.e. “255” corresponds to “11111111”) and the mask bits of the second and fourth mask bytes are all set to “0”. Examples of operator/number(s) specifications are e.g. 1232 and range 34-9339, which specify that the matching field value of an incoming packet must be equal to 1232 in the former specification and can have any value between 34 and 9339 (both inclusive) in the latter specification. Note that a route-prefix of a length l can be specified as an address/mask pair where the mask is contiguous—i.e., all bits with value “1” appear to the left of (i.e., are more significant than) bits with value 0 in the mask. For instance, the mask for an 8-bit prefix is 255.0.0.0. A route-prefix of length l can also be specified as a range of width equal to 2t where t=32-l. In fact, most of the commonly occurring specifications in practice can be viewed as range specifications.
In the following the background of search trees shall be briefly described:
A radix trie, or simply a trie (the name trie comes from retrieval, but is pronounced “try”) is a binary tree that has labeled branches, and that is traversed during a search operation using individual bits of the search key. The left branch of a node is labeled “0” and the right-branch is labeled “1”. A node, v, represents a bit-string formed by concatenating the labels of all branches in the path from the root node to v. A prefix, p, is stored in the node that represents the bit-string p. For example, the prefix 0* is stored in the left child of the root node. A trie for W-bit prefixes has a maximum depth of W nodes. The longest prefix search operation on a given destination address proceeds bitwise starting from the root node of the trie. The left (right) branch of the root node is taken if the first bit of the address is “0” (“1”). The remaining bits of the address determine the path of traversal in a similar manner. The search algorithm keeps track of the prefix encountered most recently on the path. When the search ends at a null pointer, this most recently encountered prefix is the longest prefix matching the key.
Therefore, finding the longest matching prefix using a trie takes W memory accesses in the worst case, i.e., has time complexity. The insertion operation proceeds by using the same bit-by-bit traversal algorithm as above. Branches and internal nodes that do not already exist in the trie are created as the trie is traversed from the root node to the node representing the new prefix. Hence, insertion of a new prefix can lead to the addition of at most other trie nodes. The storage complexity of a W-bit trie with N prefixes is thus O(NW). A significant amount of storage space is wasted in such a trie in the form of pointers that are null, and that are on chains—paths with 1-degree nodes, i.e., that have only one child.
A Patricia tree (Patricia is an abbreviation for “Practical Algorithm To Retrieve Information Coded In Alphanumeric”, in the following referred as to “Patricia”) is a variation of a trie data structure, with the difference that it has no 1-degree nodes. Each chain is compressed to a single node in a Patricia tree. Hence, the traversal algorithm may not necessarily inspect all bits of the address consecutively, skipping over bits that formed part of the label of some previous trie chain. Each node now stores an additional field denoting the bit-position in the address that determines the next branch to be taken at this node. The original Patricia tree (see D. R. Morrison. “PATRICIA—practical algorithm to retrieve information coded in alphanumeric,” Journal of the ACM, Vol. 15, No. 14, pages 514-34, October 1968) did not have support for prefixes.
However, prefixes can be concatenated with trailing zeros and added to a Patricia tree. Since a Patricia tree is a complete binary tree (i.e., has nodes of degree either 0 or 2), it has N exactly external nodes (leaves) and N−1 internal nodes. The space complexity of a Patricia tree is thus O(N). Prefixes are stored in the leaves of a Patricia tree. A leaf node may have to keep a linear list of prefixes, because prefixes are concatenated with trailing zeroes. The lookup algorithm descends the tree from the root node to a leaf node similar to that in a trie. At each node, it probes the address for the bit indicated by the bit-position field in the node. The value of this bit determines the branch to be taken out of the node. When the algorithm reaches a leaf, it attempts to match the address with the prefix stored at the leaf. This prefix is the desired answer if a match is found. Otherwise, the algorithm has to recursively backtrack and continue the search in the other branch of this leaf's parent node. Hence, the lookup complexity in a Patricia tree is quite high, and can reach O(W2) in the worst case.
A bit pattern of d fields has a total length of W(1)+W(2) . . . W(d)==W. We now have d tries. Prefix w(j) is now defined by bits W(1)+W(2)+ . . . +W(j−1)+1 to W(1)+W(2)+ . . . +W(j−1)+W(j), and the depth of trie j is W(j). The algorithm searches for a match in each of the trees in increasing order of prefix-lengths. For a Longest Prefix Match (LPM) of a string of w bit, w=W(1)+W(2)+ . . . +W(k−1)+k<W(k), it requires an exact match for the first k−1 bit strings over the full length W(j) (j=1 . . . k−1) and an LPM for trie W(k). The first match found for all k tries yields the longest prefix matching the given address. For an Exact Match (EM) of a string of w=W bits, it requires and exact match for all d tries. Since one exact match operation on a Patricia tree of length W(j) takes O(W(j)) time, the complete matching operation has complexity O(W2(1))+O(W2(2)) . . . O(W2(d))<=O(W2) (sequential execution) or O(max(W2(j)) (parallel execution)
It should be pointed out that the method is not limited to Patricia tree data structure; other forms of binary trie structures such as the level-compressed trie can be used as well (Andersson and Nilsson, Information Processing Letters, 46:295-300, 1993; Nilsson and Tikkanen, 2nd Workshop on Algorithm Engineering (WAE '98), 1998). The main requirement is that the trie-structure allows for backtracking if a LPM is required (this is not required for RM† and EM). Moreover, wherever an exact match is required, other algorithms than trie searches (linear search, hashing) can be used as well.