Viruses, Trojans, spyware, and other kinds of malware are a constant threat to any computing device that requires network connectivity. Malware can delete or encrypt important files on computing devices, download additional malware, spy on a user's behavior, use a user's computing device as part of a botnet to carry out large-scale malicious behavior, or perform any of a number of other harmful actions. Many different types of security systems exist to combat these threats, ranging from browser plug-ins to virus scanners to firewalls, and beyond. Countless new instances and permutations of malware are created every day, requiring security systems to constantly analyze new files to determine whether those files are malicious or benign. In some cases, security systems may compare new files to previously identified malicious files in order to determine whether the new files are new instances of already-discovered malware. Malware isn't the only area in which it is useful to determine when a file is a complete or partial copy of another file. For example, data loss prevention systems may also match files, as may systems for determining whether a file contains re-used code.
Traditional systems for comparing files may involve lengthy analysis of the new file, consuming a significant amount of computing resources. Some traditional systems for matching files may only be capable of matching identical files, allowing attackers to thwart the systems by changing minor details of files. The instant disclosure, therefore, identifies and addresses a need for systems and methods for efficiently matching files.