With a combination of the Internet and mobile broadband (MBB), and large-scale promotion and application of intelligent devices such as an intelligent terminal and a tablet computer, MBB data network traffic greatly increases. This also brings about a new problem, that is, various network anomalies occur frequently. The network anomalies include abnormal traffic, network attacks, viruses, and the like, and the abnormal traffic includes heavy hitters and heavy changers. This causes great negative effects to network utilization, network performance, and user experience, and also leads to risks such as key information leakage, and system and terminal damages.
In various network anomalies, a heavy hitter and a heavy changer are two most important types of network anomalies. The heavy hitter refers to a data stream that frequently occurs in a network, and is defined as a data stream having large overall traffic in this specification. The heavy changer refers to a data stream whose main features (including a size, a port number, a protocol number, and the like) change a lot within a given time period. An IP data stream object (referred to as an “object” in the following) is defined using a quintuple (including a source IP, a destination IP, a source port, a destination port, and a protocol number) of an IP packet.
Currently, a method for identifying abnormal network traffic includes: 1) a data collection node randomly sends collected elements for different objects to one or more work nodes, where a relationship between an object and an element may be represented as “element (object, value)”, that is, “element (key, value)”; and the “value” included in the element may be a traffic value of the element, or information (such as a quantity of data packets included in the element) that can indicate a traffic value of the element; 2) the work node maps, according to a mapping algorithm, the received elements to a data structure table formed by multiple buckets, and when each time interval ends, reports, to a control node, total traffic of elements that are mapped to each bucket within the time interval, where elements for a same object are generally mapped to a same bucket; in addition, because there are a large quantity of objects, to save storage space occupied by the data structure table, different objects may be mapped to a same bucket; and 3) the control node aggregates information reported by each work node, and when total traffic of elements that are mapped to all buckets of objects of a category is greater than a threshold, identifies the objects of this category as heavy hitters, where the objects of this category refer to objects that are in a same work node and mapped to a same bucket.
In the foregoing method, when total traffic of elements that are mapped to all buckets of objects of a category is greater than a threshold, the objects of this category are considered as heavy hitters. However, a reason causing that the total traffic of the elements that are mapped to all buckets of objects of a category is greater than a threshold may be that the objects of this category are formed by many small-traffic objects. Therefore, when the foregoing method is used for identification, these small-traffic objects may be wrongly identified as heavy hitters, that is, identification accuracy of the foregoing method is low.