In traffic classification, multiple fields of a packet header are checked according to a predefined rule, and corresponding processing is performed based on a matching situation. A set of rules used in the traffic classification is called a traffic classifier. Each rule in the traffic classifier is related to several fields in the packet header. For example, a standard Internet Protocol Version 4 (IPv4) quintuple rule includes five fields, that is, a source IP address, a destination IP address, a protocol type, a source port number, and a destination port number. Matching modes vary for different fields. The IP address uses prefix matching; the protocol type uses precise matching; and the port number uses range matching.
A traffic classification algorithm based on a decision tree is a rule set segmentation algorithm which segments a rule set in a recursive manner by using a certain segmentation policy until the number of rules in each sub-rule set is less than a preset bucket size. A binary decision tree, called a binary tree for short, may be created through segmentation. An intermediate node of the binary tree saves the method for segmenting the rule set; and a leaf node of the binary tree saves all sub-rule sets that may be matched. In searching, related fields are extracted from a packet header to compose a keyword, and then, the keyword is used to traverse a created decision tree until a corresponding leaf node is found. The keyword is compared with a rule in the leaf node to finally obtain a rule that matches the packet and has the highest priority.
Currently, a Modular algorithm, which is a stage-by-stage bit selection segmentation traffic classification algorithm based on a decision tree, is provided. The Modular algorithm regards a rule as a 3-bit string composed of ‘0’, ‘1’ and ‘*’ without a concept of dimension, in which the ‘*’ represents a wildcard, and the binary digit of the ‘*’ may be 0 or 1. In segmentation, the number of rules corresponding to a certain bit whose value is ‘0’, ‘1’ or ‘*’ is calculated, and an optimal bit is selected for segmentation according to a priority metric formula. When a certain bit is selected for segmentation, rules with the value ‘0’ of this bit are put in a sub-rule set; rules with the value ‘1’ of this bit are put in another sub-rule set; and rules with the value ‘*’ of this bit appear in both sub-rule sets. In this way, the original rule set is divided into two sub-rule sets. A range rule may be converted into a prefix before being segmented by using the aforementioned method. The original rule set is segmented in a recursive manner by using this method until the number of rules in each sub-rule set is less than a preset maximum number of rules allowed in the leaf node. In this way, a binary decision tree may be created. Meanwhile, in order to reduce rule replication, the Modular algorithm divides the rule set into four sub-rule sets which respectively correspond to four conditions, that is, neither source IP nor destination IP is ‘*’, only source IP is ‘*’, only destination IP is ‘*’, and both source IP and destination IP are ‘*’. Different binary decision trees are created for the four sub-rule sets respectively. In searching, multiple binary decision trees are searched in a parallel manner.
During the implementation of the present disclosure, the inventors find that the prior art has at least the following problems.
When one bit is selected for segmentation, one binary decision tree is created in the end. The depth of the tree is large, which affects the decision efficiency. During the creation of the binary tree, if a range is extended to a prefix, a random range may be converted into 30 prefixes in the worst case. Taking a standard IPv4 quintuple as an example, each rule includes two ranges: a source port number and a destination port number; in the worst case, one rule is extended into 900 rules, which excessively occupies memory space. In addition, the method for reducing the rule replication in creating the binary tree is rough. When the sub-rule set includes many ‘*’, a rule is still replicated for many times.