Separating an object into its constituent components, thereby allowing an analysis of the internal structure of the object based on those components, is a long-standing problem in the reverse engineering of complex systems, particularly in the areas of computer software analysis and malware detection. Various techniques have been developed to compare software samples. Existing literature discusses comparing source code samples for software engineering reuse purposes, but the body of literature on comparing executables is limited. Most research on executable comparison focuses on the comparison of entire executables for similarities.
Malware is frequently built by statically linking newly created control code with various existing libraries producing a resultant stripped binary module. Computer malware detection has typically been conducted with the use of programs that monitor files and applications on individual computers. The detection methods often rely on large databases that contain signatures of previously identified computer viruses, worms, trojans, spyware, or other malicious computer programs. Malware scanning programs search individual files on individual computers searching for known signatures. While this pattern detection approach can be effective it requires frequent updates to the database of signatures to keep abreast of the most recent malware developments and may not provide any indication of the source of the malware infection.
Interest by the reverse engineering and anti-malware communities in analysis of applications has increased due to the widespread public adoption of computing technologies such as personal computers and smart phones, and the large amount of personal or financial data that may be subject to exploitation by malicious programs. There are also general needs for forensic tools that may assist with the identification or location of malware authors or distributors.