In a typical computer network, computer systems are coupled to the network in a manner that allows the computer systems to access data from a variety of sources of information. Data accessed by such network coupled computer systems can be provided by internal, directly coupled and remote sources of information. Unfortunately, the data accessed from such sources of information can include malicious software that is designed to infiltrate and/or damage the computer system. Such malicious software is called “malware.”
Malware as used herein can include but is not limited to computer viruses, worms, trojan horses, spyware and some adware. For reasons that are apparent, malware operates without the informed consent of computer system owners. Indeed, malware can be designed to thwart computer system software (or hardware) that is installed as a defense against active malware. An example of conventional computer system software that can be installed as such a defense against the threat of malware can include antivirus programs.
The proliferation of malware is currently increasing at an accelerated rate. This is because the “barriers to entry” for designers of malware (the challenges that need to be overcome for designers of malware to produce new malware versions) are lower than ever. This process has been abetted by the availability of very high quality software development kits that can provide even neophyte malware designers with the capacity to create dangerous new malware variants. Some of these kits enable a malware designer to recompile malware source code with minor source code modifications so as to develop malware that can avoid detection. The new malware versions have a significant semantic resemblance to previous versions and thus present a similar threat.
In fact, many of the newly appearing malware files are malware variants that belong to a few very active existing malware families such as Bots. When compiled, source code associated with such malware variants that include source code level changes such as discussed above, can be compiled to the same functions as the previous malware versions even though the underlying binary code is different (corresponding to changes in the source code). These differences in binary code between a malware variant and a previous malware version can cause the detection of the malware variant to be frustrated because the data that is used to identify the previous malware version may not be effective for identifying the malware variant.
A conventional approach to identifying malware variants is the use (e.g., as a part of or in conjunction with an antivirus program) of generic malware signatures. A generic signature can be extracted by researchers from malware code and used to identify malware from malware families. These signatures can be stored and compared with incoming files to facilitate the identification.
Generic signatures only guard against general malware types and may not be very effective against particular malware variants such as discussed above. While generic signatures can partially address the problem of combating certain active malware families, their effectiveness is restricted by shortcomings such as performance and accuracy. Antivirus researchers spend a significant amount of time and effort analyzing samples of code from malware and employ a variety of techniques to identify effective signatures. However, this process can be tedious and slow and is very error prone (errors made during the creation of a signature can result in false positive malware identifications). Moreover, this process represents a largely unsatisfactory manual response to an active and dynamically evolving malware threat.