Efficiently and accurately identifying hosts that are spreading the largest amount of flows during an interval of time is very important for managing a network and studying host behaviors on application level, ranging from detecting DDoS attack, worm propagation, peer-to-peer hot spots and flash crowds. No previous work has been able to efficiently and accurately identify the top spreaders at very high link speed, for example, 10 to 40 Gbps.
Considering the case of finding hosts who are spreading a large amount of flows, FIG. 1 shows a scenario: hosts in a local ISP network communicate with other hosts in the global Internet through a high speed link. As shown, host3 is communicating with a lot of hosts, and may be a very popular web server, or may be initiating or under DDoS attack. It is required to quickly and efficiently identify such kind of hosts in a network, and know how severe the situation is.
There has been a lot of works on measurement of traffic statistics for network management, security, and its evolvement. The size distribution and matrices of flows from the hosts may help a network to provide and engineer traffic thereof. Finding flows that have a large number of packets is useful in billing and accounting. It has also been shown that flow level communication patterns may further reveal application level behaviors of each host.
Typically, flows of small sizes are more interesting to security related problems. For example, a host scanning a port or address typically sends only a very small number of packets to each victim, to keep the overhead small and lower the chance to be detected. A SYN flood DDoS attack typically contains only one SYN packet in each attack flow, and the acknowledged ACK packets are ignored. A newly exposed TCP attack uses many low rate TCP sessions to exhaust resources of the victim, and during a small interval, these TCP flows can also be viewed as small flows. P2P applications tend to contact some servers or other peers to exchange control information in a periodical fashion, and such control messages typically contain a small number of packets.
Currently, there are some problems of detecting super hosts with small flows in the field.