As computer networks such as the Internet continue to grow in size and complexity, the challenge of effectively provisioning, managing and securing networks has become linked to a deep understanding of their traffic. Indeed, recent spates of cyber-attacks and the frequent emergence of applications affecting Internet traffic dynamics demonstrate the importance of identifying and profiling significant communication patterns within network traffic data. Nevertheless, because of the vast quantities of data and the wide diversity of traffic on large networks, developing a comprehensive understanding of the collected data remains a daunting and unfulfilled task. Most of the prior work in this area has focused on specific aspects of traffic or applied metrics that are deemed interesting a priori to identify significant network events of interest. For example, several known systems focus on techniques for identifying port scans or for analyzing worm and other exploit activities on the Internet. Further, signature-based intrusion detection systems look for well-known signatures or patterns in network traffic, while several anomaly detection systems have been developed using data mining techniques.
At the enterprise, service provider, and public network scale, network management systems are used to monitor networks. These systems can exist as stand-alone, dedicated systems or be embedded in network communications devices such as routers and switches. One specific example is NetFlow technology offered by Cisco Systems. Other tools include special-purpose systems, such as firewalls and other network security devices that are typically used to manage the communications at boundaries between the networks.
However, there are currently insufficient techniques in the art directed towards generating general profiles of traffic in terms of behaviors, i.e., communication patterns of end-hosts and services. The need for such profiles has become increasingly imperative and urgent in light of wide spread cyber-attacks and the frequent emergence of disruptive applications that can rapidly alter the dynamics of network traffic and bring down valuable Internet services. Accordingly, improved systems and methods that can identify significant communication patterns from vast quantities of traffic data are desirable.