1. Field of the Invention
The invention relates to a technique, specifically apparatus and accompanying methods, which utilizes a trie-indexed hierarchy forest that accommodates wildcards for, inter alia, retrieving, given a specific input key, a pattern stored in the forest that is identical to or subsumes the key. This technique finds particular, though not exclusive, use in a computer-based network, for efficiently classifying packets in order to apply any network related processing applicable to the packet, e.g., packet scheduling. Thus, this invention also relates to apparatus for a packet classifier and accompanying methods for use therein that embody this technique.
2. Description of the Prior Art
Packets are routinely used within a computer network to carry digital communication between multiple nodes on the network. Any such network possesses a finite number of different connections among the nodes therein. Generally speaking, whenever a computer stationed at one such node is executing an application wherein the results of that application will be sent to a computer at another network node, the former computer will establish a connection, through the network, to the latter computer and then send these results, in packetized form, over that connection.
A network interface for a personal computer contains both specialized hardware, as required to physically and electrically interface that computer to the network itself, and associated software which controls the interface and governs packetized communication therethrough. Inasmuch as the interface hardware is irrelevant to the present invention, we will not discuss it in any detail. The software is often part of the operating system, such as exists in, e.g., the Windows NT operating system currently available from the Microsoft Corporation of Redmond, Washington (which also owns the registered trademark "Windows NT"). In particular, the software implements, inter alia, various processes that rely on classifying each packet and processing the packet accordingly. One such process is packet scheduling. While several other such packet-related processes, such as routing, security and encryption, are also employed in the software, for purposes of brevity and illustration, we will confine the ensuing discussion to scheduling.
In particular, while a physical network interface to a computer operates at a single speed, e.g., 10 or 100 Mb/second, several different streams of packetized traffic, depending on their data content, can be interleaved together by the computer for simultaneous transmission at different rates through that interface. To accommodate this, a packet scheduler directs individual packets in each such stream to a corresponding software transmission queue (also referred to hereinafter as simply a "queue") from which packets in each such stream will be dispatched, in proper order, for network transmission at a corresponding rate--regardless of the network destinations of these streams. Given the substantially increased data rate required for, e.g., video data over textual data, the scheduler encountering both video and textual packetized data will ensure that each type of data packet is directed to the proper queue such that a significantly greater number of video-carrying packets than text-carrying packets will subsequently be transmitted per unit time through the interface. Other software implemented processes successively pull individual packets from the queues, at the corresponding data rates associated therewith, for multiplexing and subsequent transmission through the interface to the network. Inasmuch as these other processes are not relevant to the present invention, they will not be discussed in any further detail.
Packet classification, for purposes of packet scheduling, will be performed, by constructing a so-called "key" from select fields, contained in a packet in order to associate the packet with a corresponding queue. These fields are illustratively source and destination addresses (typically IP addresses) and corresponding port designations (all of which are collectively referred to herein as "classification fields"). This association typically occurs by consistently concatenating the classification fields, in the packet being classified, into a key which, in turn, is used to access a data structure in order to retrieve therefrom an identification of a corresponding queue for that packet. Since a group of packets that have differing values for their classification fields can nevertheless be transmitted at the same rate and hence should be directed to the same queue, a mask field containing one or more so-called "wildcard" (oftentimes referred to as "don't care") values is often used, through logical combination with the classification fields, to yield an identification associated with a single queue. Generally speaking, this identification is viewed as a "classification pattern", i.e., a bit field having a length equal to the total length of the concatenated classification fields wherein any bit in the pattern can have a value of "1", "0" or "X", where "X" is a wildcard. As a result, a single pattern having a wildcard(s) therein can serve to classify an entire group of such packets. If a match occurs between the non-wildcard bits of the pattern (i.e., taking the wildcard value(s) into account) and corresponding bits of the classification fields for the packet being classified, then an associated queue designation for that pattern is accessed from the data structure.
By virtue of permitting wildcards within packet classifications (i.e., patterns), the complexity associated with searching the data structure, for a pattern given the classification fields for a packet, as well as that of the structure itself, increases considerably. Furthermore, the process of classifying packets lies directly within the data flow and adds another layer of processing thereto, i.e., each outgoing packet must be classified before it is launched into the network. Consequently, any added complexity necessitated by accommodating wildcards will require additional processing. Since only a finite amount of processing time can be allocated to classify each packet and packet classification tends to be processor intensive, such classification needs to be rather efficient--particular for use in a computer that is to experience significant packet traffic.
A principal way to increase classification efficiency is to utilize a process that retrieves stored classification information as fast as possible from a data structure.
While the art teaches various approaches for classifying packets, such as that typified in M. L. Bailey et al, "Pathfinder: A Pattern-Based Packet Classifier", Proceedings of First Symposium on Operating Systems Design and Implementation (OSDI), USENIX Assoc., 14-17 November 1994, pages 115-123 (hereinafter the "Bailey et al" paper), and W. Doeringer et al "Routing on Longest-matching Prefixes", IEEE/ACM Transactions on Networking, Vol. 4, No. 1, February 1996, pages 86-97 (hereinafter the "Doeringer et al" paper), these approaches, while efficient and effective in their target environments, are limited. In that regard, the techniques described therein exhibit retrieval times that are generally linearly related to the number of elements (n) in a classification database. A large network will have a substantial number of different patterns. Hence, the size of a classification data structure for such a network can be considerable which, in turn, will engender linearly increasing and hence relatively long retrieval times as the database expands. For packetized payload data that requires a high data rate, such retrieval times may be inordinately long and cause excessive delay to a recipient user or process.
Furthermore, packet classifiers fall into two distinct types: declarative and imperative. Basically, a declarative classifier stores a pattern as a filter while the filter, used in an imperative classifier, contains a small segment of executable code. In particular, a declarative classifier relies on embedding a description, such as a key, in each packet for which a classification is sought and then matching the key to a pattern in a stored classification and retrieving an associated value, such as a queue designation, therefrom. An imperative classifier, on the other hand, executes, possibly on an interpretive basis, the code segment in each and every stored classification, in seriatim, against the header of an incoming packet to compute a result. The result specifies a corresponding pattern for that packet. Where the number of patterns is rather small, an imperative classifier executes a relatively small number of code segments against each packet. Inasmuch as each code segment is implemented through a highly simplified instruction set, the classification delay for small networks, with relatively few patterns, tends to be tolerable. However, both processing complexity and associated delay become intolerable for packet classification in a large network that has an extensive number of different patterns. Declarative classifiers eliminate the need to process each incoming packet through a separate code segment for each and every classification, and hence provide significantly shorter response times. However, prior art declarative classifiers, such as that typified by the methodology described in the Bailey et al paper, exhibit classification delay that is linear in the number of stored patterns and hence can be intolerably long in a large network which is expected to carry packetized payload data, such as video, that requires a high data rate.
While the Doeringer et al paper describes a declarative classification methodology that can handle a pattern containing wildcards, this methodology requires that all wildcards be located in a contiguous group at the end of the pattern. Inasmuch as this methodology is directed to IP (Internet Protocol) routing where arbitrarily placed wildcards do not occur, this methodology is acceptable in that application. However, for packet classification, where classification can occur for various different purposes, such as scheduling, and a wildcard can arbitrarily occur anywhere in a pattern, the methodology taught by the Doeringer et al paper is simply unsuited for such broad-based classification.
Therefore, a need exists in the art for a fast and versatile packet classifier that incorporates a search technique capable of rapidly retrieving stored information, from a data structure, given a specific value in a group of classification fields in the packet. Furthermore, this technique should accommodate a wildcard(s) located in any arbitrary position within a stored pattern and, to reduce delay, should preferably exhibit retrieval times that increase less than a linear function of the number of stored patterns.
Moreover, and apart from use in packet classification, a broad need continues to exist in the art for a generalized technique that, given specific input data (in the form of, e.g., a key), can rapidly retrieve a stored pattern, containing a wildcard(s) at any location therein, from a data structure--regardless of what the input data and patterns specifically represent. Such a technique would likely find widespread use including, but not limited, to packet classification. While the opposite problem of how to rapidly find a specific datum stored in a database given a generalized input pattern has been extensively investigated in the art, the converse problem, inherent in, e.g., a packet classifier, of how to swiftly retrieve a generalized stored pattern from a data structure given specific input data, e.g., classification fields, has apparently received scant attention in the art, which thus far, as discussed above, has yielded rather limited and disappointing results.