As computer networks such as the Internet continue to grow in size and complexity, the challenge of effectively provisioning, managing and securing networks has become linked to a deep understanding of their traffic. Indeed, recent spates of cyber-attacks and the frequent emergence of applications affecting Internet traffic dynamics demonstrate the importance of identifying and profiling significant communication patterns within network traffic data. Nevertheless, because of the vast quantities of data and the wide diversity of traffic on large networks, developing a comprehensive understanding of the collected data remains a daunting and unfulfilled task. Most of the prior work in this area has focused on specific aspects of traffic or applied metrics that are deemed interesting a priori to identify significant network events of interest. For example, several systems today focus on techniques for identifying port scans or for analyzing worm and other exploit activities on the Internet. Further, signature-based intrusion detection systems look for well-known signatures or patterns in network traffic, while several anomaly detection systems have been developed using data mining techniques.
However, there are currently insufficient techniques in the art directed towards generating general profiles of traffic in terms of behaviors, i.e., communication patterns of end-hosts and services. The need for such profiles has become increasingly imperative and urgent in light of wide spread cyber attacks and the frequent emergence of disruptive applications that can rapidly alter the dynamics of network traffic and bring down valuable Internet services. Complicating the task of profiling during these cyber attacks is the large volume of network traffic that accompanies such attacks. Indeed, there is a need for a robust real-time traffic behavior profiling system that is capable of continuously extracting and analyzing “interesting” and “significant” traffic patterns on high-speed network links, even in the face of sudden surge in traffic (e.g., when the network is under a denial-of-service attack).