Malicious software, or “malware,” is a term which generally describes software that is purposefully designed to cause disruption and problems on the computers and systems on which it runs. Malware includes, for example, viruses, worms, trojan horses, rootkits, adware, spyware, bots, and other destructive software forms. Malware can cause significant damage in the form of disabled or corrupted firmware, software applications or systems, lost or damaged data, data theft and loss of security (e.g., increased vulnerability) for systems or data, among other problems. Such damage is not necessarily limited to individual computer systems given today's high level of interconnectivity among computers through various networks including the Internet. As a result, significant resources are expended to identify and analyze malware to be able to develop solutions and address areas of vulnerabilities. Analysts are also interested in understanding how various malware variants may be clustered into families based on their functionalities.
Malware analysts often spend significant amounts of time analyzing disassembled functions from files which may contain malware. While current tools let analysts save their chosen function names and comments in a disassembly database, such tools only allow for a one-to-one relationship between the disassembly database and the malware files. Thus, a function of interest can be compared against another function and the current tools will determine if the two functions are the same or not. However, analyses performed by malware analysts do not persist across disassemblies of many different files that contain similar functions. In addition, current tools for clustering binary functionality to identify families of malware use static analyses that do not scale well, or can return unreliable results because many different families have near-identical characteristics when subjected to dynamic analyses.
This Background is provided to introduce a brief context for the Summary and Detailed Description that follow. This Background is not intended to be an aid in determining the scope of the claimed subject matter nor be viewed as limiting the claimed subject matter to implementations that solve any or all of the disadvantages or problems presented above.