Malicious software (“malware”), such as computer viruses, worms, and trojans, is a serious threat to current computing systems. Additionally, new malware variants are being created constantly. Typical malware detection systems identify malware and malware variants using file signature scanning to identify particular binary files associated with malware. Typically, updated malware signatures are required to detect even minor malware variants.
Many malware researchers believe that many new malware variants may be created by the same individuals or groups, for example through source code reuse. However, typical file signature scanning may not detect similarities between malware caused by code reuse. Currently, the provenance of a particular variant of malware may be determined, for example, by manually examining textual strings or domain names included in the malware binary. Sliding window hashes have been used as a means to determine compiled binary code similarity. One such algorithm is CMU Bitshred, described in Jiyong Jang & David Brumley, BitShred: Fast, Scalable Code Reuse Detection in Binary Code, Carnigie Mellon University Technical Report, CMU-CyLab-10-006 (2009).