1. Field of the Invention
The present invention relates to computers and computer networks. More particularly, the invention relates to profiling Internet traffic flows to identify network applications and/or security threats responsible for the traffic flows.
2. Background of the Related Art
In the past years, the number of cyber attacks keeps increasing affecting millions of systems. Such malicious activities, often termed as Malware (acronym from malicious software), includes different worms, botnets, trojans, backdoors, spyware, etc. Then, there is a new trend in exploiting social networks and mobile devices. Also, the sophistication and effectiveness of cyber-attacks have steadily advanced. These attacks often take advantage of flaws in software code, use exploits that can circumvent signature-based tools that are commonly used to identify and prevent known threats, and social engineering techniques designed to trick the unsuspecting user into divulging sensitive information or propagating attacks. These attacks are becoming increasingly automated with the use of botnets-compromised computers that can be remotely controlled by attackers to automatically launch attacks. Bots (short for robots) have become a key automation tool to speed the infection of vulnerable systems and are extremely stealthy in the way they communicate and ex-filtrate personal/proprietary information from the victims' machines/servers. The integration of such sophisticated computer attacks with well-established fraud mechanisms devised by organized crime has resulted in an underground economy that trades compromised hosts, personal information, and services in a way similar to other legitimate economies. This expanding underground economy makes it possible to significantly increase the scale of the frauds carried out on the Internet and allows criminals to reach millions of potential victims.
Such continuous and ever changing challenges to protect the users has made cyber-security is a very active and bleeding-edge research. This has become an arm race between the security researchers and malicious users. Today's approach to information security can be broken down into two major classes of technologies, host security, and network security.
A prevalent category of host-based security is malware prevention, comprising a broad group of agent-based solutions that look for particular signatures and behavioral signs of malicious code execution at the host level. This approach, known as blacklisting, focuses on matching specific aspects of application code and particular actions being attempted by applications for detection. Signature-based/blacklisting detection has been around for many years. In that same time, viruses, worms, sniffers, trojans, bots and other forms of malware have infiltrated e-mail, instant messaging, and later, social networking sites for the purpose of criminal financial gain. With improvements in correlation and centralized management, blacklisting still works very effectively in most distributed enterprise and capable to (i) pinpoint malicious activities with high detection rate while very low false positive/false negative rates, (ii) reverse engineering the malware executable to highlight malware inner properties such as message structure and message passing (strengths and weaknesses of the malware), and (iii) assess the level of risk of the threat by analyzing effects to the end-host (such as system calls, registries being touched, etc). However, because these signature-based models depend on advanced knowledge of malicious code and behaviors, some instances can be missed, leading to potential malicious execution.
On the network side, three prevalent approaches are blended together to offer network-based security, (i) firewall systems, (ii) intrusion detection/prevention systems (IDS/IPS) and (iii) network behavior anomaly detection (NBAD) systems. These three different approaches complement each other and are commonly adopted/deployed by enterprises to form a holistic network security strategy. Generally, the first two approaches tackle the network security problem in a similar fashion as the host security (usage of threat signatures specialized at the network level), and thus prone to similar benefits and shortfalls as for the host security. The third approach attempts to discover threats without requiring a-priori knowledge of the malicious code and behavior by using algorithms to generate model(s) that retain(s) the properties of good traffic and alarm for sessions that do not conform to the model. While effective in spotting threats never seen before, the third approach is still prone to high rate of false positive/false negative that the security analyst is forced to screen before making a decision. This shortfall is mostly due to the lack of a solid ground truth that the statistical tools can be trained on to produce precise statistical models emulating the threat activities.