1. Field of the Invention
The present invention relates to generally to networks, and more particularly, to analyzing traffic in a network.
2. Description of the Related Art
Network traffic analysis has become increasingly important for various network management and monitoring functions such as traffic engineering and anomaly detection and response. Due to high traffic volume in many high-speed networks, it can be useful to derive succinct summary information from such traffic volumes to facilitate the characterization of aggregate traffic behavior patterns.
Such aggregate behaviors are characterized by the host distributions of distinct communicating peers or flows. For example, port-scanning activities during a worm outbreak would cause many hosts to have an increasing number of (one-way) peers (or flows), and hence, a change in the host distributions of distinct communicating peers or flows.
One way to characterize aggregate traffic behavior patterns is by using feature distributions. In this regard, prior work has focused primarily on distributions concerning traffic volume, such as flow-size distribution (e.g., finding the total number of flows having a given flow size) and the inverse distribution of packet contents (e.g., finding the total number of strings having a given frequency). Distributional aspects, such as entropy (e.g., finding the entropy of a packet distribution over various ports) have also been subjects of interest.
Despite much work on feature distributions concerning traffic volume, little attention has been paid to traffic-feature distributions involving distinct counts, such as the number of destinations or flows corresponding to one or more given IP addresses. These distributions are very useful for characterizing communication connectivity patterns between hosts inside a network and across the Internet, which patterns might not be reflected by the volume data. Understanding such patterns is useful for network service providers to manage their networks more efficiently. On the traffic engineering side, if the number of peers for many hosts increases over time, this may indicate that the number of peer-to-peer (P2P) hosts is on the rise, which may further alert the network provider to improve its traffic-engineering solution for the P2P traffic. Statistically, the distribution of the number of peers vs. the number of hosts involves a mode change, i.e., a change in the value that occurs the most frequently in the distribution. In other words, a new mode appears for the common number of peers (typically a range from 64 to a few hundred, depending on the size of the P2P network) with which the P2P hosts are communicating. On the anomaly-detection side, if the number of peers for many hosts has a sudden increase, this may indicate attack activities, such as port scans. In this scenario, the distribution will have a shift in its mode.
Such distributional changes cannot be easily detected using marginal aspects such as entropy, mean, or variance. For example, a shift in the mean of a distribution with no shape change will not change the entropy, such that good estimates of the distributions in real time are desirable to permit capturing all such changes.
Besides estimating the distribution for all hosts communicating through a high-speed provider router, or all hosts inside a stub network, the distribution for each group of IP addresses can also be specifically monitored. One example is the detection of “botnets,” which are compromised computers (dubbed “zombies” or “bots”) running software, usually installed via worms, Trojan horses, or backdoors, under a common command and control infrastructure. In botnet detection, once the set of candidate bot controllers is identified, their behavior is then monitored. Monitoring the distribution of the peers of each candidate bot controller would therefore be desirable, because this distribution can identify whether many of the peers are actively working for the candidate bot controller. New attacks will result in changes of the cardinality distribution.