With the advent of computers and communication networks, the ability to generate, store, utilize, distribute, publish or otherwise share content and information has vastly improved. This has further led to the routine transfer of large amounts of data, content and information between devices. While much of the material transferred between devices is exactly that which is desired by the corresponding users, malicious software (or malware) can also be transferred among devices. The malware may pose privacy or security concerns, or it may be disruptive or even destructive and costly in some situations.
In order to attempt to minimize the impact of malware, anti-virus software, network operations centers, network security offices and other entities may operate to attempt to accurately and quickly determine whether a received piece of unknown software includes binary code that is or contains malware. Some options for identification of malware include the use of databases of checksums, of checksums derived from header information, of context-triggered hashes, and of other signature-based methods. However, these methods are generally considered to be brittle in that they cannot express or detect changes in size and content of binaries well enough to support immediate recognition of related or modified binaries. As a result, slight changes to known malware or the use of functions from known malware in a new binary cannot be detected by these methods. Other methods that rely on detecting structural aspects of the software require a level of expert intervention that makes them unsuitable for automated processing of large volumes of data.
Forensic analysis of binary codes can often be a lengthy, time-consuming operation that requires highly trained specialists. Thus, these scarce and often expensive resources must remain focused on understanding new threats. The use of such resources to identify known or slightly modified versions of known malware is not optimal. However, it is often difficult to avoid some level of exposure of these resources to the less optimal tasks.
Accordingly, it may be desirable to continue to develop improved and/or more efficient mechanisms by which protection against malware may be provided. Moreover, in some cases, the detection of related code variants in binaries outside the context of malware detection may also be useful.