Traffic classification generally refers to defining some rules according to certain characteristics of packets, and using the rules to identify packets that meet certain characteristics, so as to classify packets. Multiple packets that match a specific rule constitute a stream. With a traffic classification mechanism, different streams can correspond to different quality of service (QoS for short). Compared with a traffic classification method based on dedicated hardware such as a ternary content addressable memory (TCAM for short), a decision-tree-based traffic classification method is vastly superior in terms of the speed of searching for matching rules, cost saving, and so on.
The principle of the decision-tree-based traffic classification method is to establish a decision tree by dividing a rule set into multiple rule subsets, and search in the rule subsets for a rule that matches a packet. A decision tree includes a root node, multiple intermediate nodes, and multiple leaf nodes. An operation of searching for a rule by using a decision tree may be: First, parse a packet header of a packet to obtain a keyword for searching; select a branch at an intermediate node of the decision tree according to one or more bits of the keyword to traverse the decision tree until a leaf node of the decision tree, where each leaf node includes a set of rules; match the packet with a rule subset included in the determined leaf node; and, if multiple rules that match the packet exist in this rule subset, then, among the multiple matching rules, select a rule of a highest priority as a rule for classifying the packet. For this packet, a traffic classifier performs actions corresponding to the rule of the highest priority. In practice, the decision-tree-based traffic classification method may be HiCuts, HyperCuts, or Modular.
The HiCuts method and the HyperCuts method deal with traffic classification issues from a geometric perspective. From a geometric perspective, if rules in a traffic classifier are formed by k domains that correspond respectively to a k-dimensional space, each rule corresponds to a “hyperrectangle (hyperrectangle)” region in the k-dimensional space, and each packet corresponds to a point in the k-dimensional space. The process of searching for a rule that matches a packet is equivalent to calculating the hyperrectangle into which the point corresponding to the packet falls. In the HiCuts method and the HyperCuts method, each domain in the rule is regarded as a range, and the domains of different ranges are put together and cut so that the rule set is divided into smaller rule subsets. The cutting stops when the number of rules in a rule subset is less than a preset threshold. Through cutting, a decision tree can be established. Intermediate nodes of the decision tree store information about the cutting method, for example, a dimension or dimensions selected for cutting, the number of cuts of each dimension, the rule subset stored by a leaf node, and so on.
In the Modular method, a rule set includes multiple rules of the same length. Each rule includes multiple bits. Each bit is “0”, “1”, or a wildcard. The wildcard may be expressed with “*”. When a rule set is cut into multiple rule subsets, it is required to calculate the number of 0s, 1s, or wildcards in multiple bits corresponding to a position in the rule set, and select, according to a specific algorithm, a position for cutting the rule set. After a reference position for cutting the rule set is selected, all rules whose reference position is “0” among the multiple rules in the rule set are put into a rule subset, all rules whose reference position is “1” among the multiple rules in the rule set are put into another rule subset, and all rules whose reference position is a wildcard among the multiple rules in the rule set are put into those two rule subsets. The scenario of putting all rules whose reference position is a wildcard into two rule subsets is called rule replication in this application document. Through the foregoing operation, a rule set is divided into two rule subsets. The foregoing operation may be repeated for the generated rule subset until the number of rules in each rule subset is ultimately less than a preset threshold. In this way, a binary decision tree can be established. Each intermediate node of the decision tree stores an identifier of a reference bit for cutting the rules, and pointers of two child nodes of the intermediate node; and each leaf node stores a rule subset.
In the above decision-tree-based traffic classification method, the occurrence probability of rule replication is high in the process of generating the decision tree. Occurrence of rule replication means occupation of a larger storage space.