1. Field of the Invention
The invention relates to methods for processing data packets according to a set of rules, and especially to methods for preparation of decision trees for selecting the correct rule for processing a received data packet.
2. Description of Related Art
Data packets are processed according to a set of rules in certain applications in network devices, such as in firewalls and IPsec devices. Internet Protocol data packets have a number of parameters, such as source IP address, source IP port, destination IP address, and destination IP port. IP version 4 packet format is described in RFC 791, “Internet Protocol”, J. Postel, September 1981. IP version 6 packet format is described in RFC 2460, “Internet Protocol, Version 6 (IPv6) Specification”, S. Deering and R. Hinden, December 1998.
Packet processing rules are typically set by the network administrators to control packet processing at a network node. A firewall processing rule may for example direct the node to process only packets originating from a certain IP address or an address range; or, for example, to reject packets from an IP address or an IP address range. In the general case, any fields of the data packets can be used as rule parameters. In the specific case of firewalls and IPsec nodes, the most commonly used parameters are the source and destination IP addresses and TCP or UDP port numbers. However, a packet processing unit can also observe the payload of the IP packet, such as any other header values of TCP or UDP headers, or header or payload field values of any other protocol packets within the IP, TCP, or UDP packet. The logic for selecting a matching rule may be quite complicated. It may involve an arbitrary combination of checking the protocol, network interface, source and destination IP numbers, source and destination port numbers, and possibly other conditions. Some of these parameters are not restricted to single values, as a rule may specify a range of allowed values.
In operation, a packet processing unit—be it then a firewall, an IPsec node, or any other network device processing packets on the basis of processing rule—receives a packet, then examines the rules to find out which rule matches the packet, and then processes the packet according to the instructions given in the rule. Here, matching means that particular parameter values in the packet are equal to or within the range of parameter values recited in the rule. More than one rule may match the packet, in which case the rule with highest preference is applied. The highest preference may simply mean the first rule to match the packet. A default processing may be applied in case no rule matches the packet.
Rule lookup must be very efficient, especially for cases where very short lived connections are frequent. A slow rule lookup mechanism could also be prone to denial of service attacks. It is also important to be able to update rules relatively efficiently, particularly if the rules change frequently.
Performance problems arise in the process of finding the correct rule for a given packet when the number of rules is large. A simple approach of finding the correct rule is to check each rule in turn for each packet. This is quite feasible for some tens or hundreds of rules for an average personal computer at the time of writing of this patent application. However, when the amount of rules is in tens of thousands or higher, such a simple method is not adequate.
A more efficient method employs decision trees, which when properly construed can enable a fast lookup of the correct rule from a large set of rules.
Since rules can hold any number of attributes which must match to the given packet, and since some of these attributes may have a possibly infinite range of allowed values, the problem of finding the rule for a given packet is equivalent to finding the highest precedence N-dimensional rectangle containing a given point. This problem is studied in the field of spatial access methods, as they are known in computer science. Spatial access methods have been studied widely in the literature and tens of solutions have been presented. See for example the article “Multidimensional Access Methods” by V. Gaede and O. Gunther, ACM Computing Surveys, 1997, which is incorporated herein by reference. Most of them have the drawback that a search operation may not have a guaranteed logarithmic upper bound, but may have to delve into several search paths before finding the surrounding rectangle. Practically all of these spatial access methods are designed for graphical information systems or cartography with relatively little overlapping and a more modest number of dimensions. Some solutions are designed for storing the data on disk and are suboptimal for a purely main memory setting. In general, a method for building a decision tree that is efficiently able to cope with ranges specified by rules in many different dimensions is needed.
A simplistic solution would be to search in one dimension (i.e. study one parameter at a time). In other words the search tree would be devised so that a one-dimensional search structure such as a binary tree is constructed for the possible values in that dimension until the set of possibly matching rules in each leaf can no longer be reduced. Then, for each of these subtrees, the next dimension is used to construct the next level of binary trees, and so forth.
Although search cost in such a cascaded binary tree would be relatively efficient, it would have the adverse effect of consuming unnecesarily much memory for some inputs. Because the rigid order in which subtree divisions are made, it is impossible to exploit a high selectivity in a dimension that is used for narrowing the search in the later subtrees. Therefore a rule may appear in many more leaves of the tree than would be necessary.
Building globally optimal decision trees is an NP-hard problem, but the typical greedy algorithms described in many AI text books practically always build quite sufficiently optimal decision trees for sets of data points. The greedy algorithms generally work as follows: given a set of data points (such as example data in case-based reasoning problems), pick a splitting point which divides the set of data points as well as possible according to some measure, let that splitting point be the key of the root of the decision tree, and perform the same recursively for both subsets of rules for the subtrees. In the case of packet processing, the most significant difference to the standard text book case is that the decision tree is not prepared according to a set of point data, but for a set of rules which span over possibly overlapping ranges. This has two effects: Firstly, the rule sets applicable to subtrees are not disjoint, and therefore a given rule can be encountered in many subtrees of a given node. Secondly, and more significantly, because rules apply to ranges of values instead of single points, a better splitting point selection algorithm is needed than the classic algorithm of splitting according to the median of the data points' values in some dimension.