In recent years, authors of malicious software (“malware”) have attempted to proliferate malware by generating thousands or potentially millions of variations of malicious files. For example, unique versions of malware code may be created with each new infection, or the malware program may modify itself each time it propagates to a new computer system, or even every time it runs (so-called “polymorphic malware”). Unfortunately, because many existing antivirus technologies detect malware by detecting or identifying unique digital signatures or fingerprints associated with known-malicious files, malware authors may avoid detection by only distributing new (i.e., unique) or repacked versions of malicious files.
In light of this, some security-software vendors have begun investigating and implementing reputation-based security systems. In a reputation-based security system, a security-software vendor may attempt to determine whether a file represents malware by collecting, aggregating, and analyzing data from potentially millions of user devices within a community, such as the security-software vendor's user base. For example, by determining a file's source, age, and prevalence within the community, among other details, a security-software vendor may gain a fairly accurate understanding as to whether the file represents malware.
Some legitimate software publishers and distributors, however, also distribute many unique variants of a program. For example, a software developer may customize each copy of a program it distributes to personalize the user experience for each customer, or to facilitate detection of unauthorized copying of the software. A software distributor may also repackage freeware or “adware” programs with advertisements uniquely selected for each customer.
Unfortunately, malware detection systems that rely on signature-based detection may not recognize customized versions of a legitimate program as variants of a single program. In addition, reputation-based systems may incorrectly identify unique or similar versions of a program with low prevalence and unknown origin or age as potential polymorphic threats. These mistakes, known as “false positives,” may be extremely disruptive and costly for an enterprise since they can result in the deletion or removal of legitimate, and potentially essential, files and software from computing devices within the enterprise. Accordingly, the instant disclosure identifies and addresses a need for additional and improved systems and methods for preventing false-positive malware identification.