The scourge of malicious software is growing every hour. Antivirus companies must promptly and adequately react to these arising threats, but face many challenges. One such challenge involves striving for accuracy of malware detection. An adequate response to a threat involves the absence of false positives, i.e. the response must neutralize the threat but must not adversely affect benign files or other objects.
A response to a threat can involve creating a rule for detection of such threat and for its subsequent elimination. A detection rule, in a particular case, can be represented by signatures, heuristic rules or hash sums, i.e. methods which allow to detect target files from the whole variety of researched files. Once a rule is created, the rule is tested for absence of false activations. After the testing, the rule begins to function on the user side; quite often, a rule can be also additionally tested at the stage of its active functioning on the user side.
U.S. Pat. No. 8,280,830 discloses a system for modifying a detection rule based on preliminary testing of a created rule on a collection of safe files and malicious files. But a collection of safe files and malicious files available to antivirus software manufacturers cannot cover the entire variety of files encountered by users; therefore, quite often, feedback from the detection rule is used when it is already functioning on the user side. An antivirus application using a detection rule sends notices to developers, specifying which files activated the rule, and the developers analyze this information on their side. U.S. Pat. No. 8,356,354 discloses a system for issuing updates of antivirus databases; one of the embodiments provides that the antivirus application sends information to developers, specifying what files triggered the rule, and that the received information is analyzed for false activations of the rule.
But even the combined use of a collection of safe files and malicious files, with feedback from users, cannot guarantee the effectiveness of a detection rule because the collection is incomplete and because it is not possible to test the rule on files which will appear in the future, and also because not all users have antivirus software installed. Also, a significant drawback of the feedback method used today is that, generally, the file's checksum or hash, rather than the file itself, is used for the feedback, and if a file from the collection is even slightly different from the file which activated the rule, the checksums or hashes will not match and false activation will not be detected.
The probability of false activations increases in the case when the detection rule is created not for one file but for a group of similar files. In general, the larger the number of files for which the rule is being created, the higher the probability of false activation. Many of the known approaches have failed to take this fact into account.
An effective and practical solution is therefore needed to evaluate malware detection rules while avoiding false positives.