1. Field of the Invention
The present invention is related to anti-malware technology, and more particularly, to detection and minimization of false positives occurring during anti-malware processing.
2. Description of the Related Art
Detection of viruses and malware has been a concern throughout the era of the personal computer. With the growth of communication networks such as the Internet and increasing interchange of data, including the rapid growth in the use of e-mail for communications, infection of computers through communications or file exchanges is an increasingly significant consideration. Infections take various forms, but are typically related to computer viruses, Trojan programs or other forms of malicious code (i.e., malware).
Recent incidents of e-mail mediated virus attacks have been dramatic both for the speed of propagation and for the extent of damage, with Internet service providers (ISPs) and companies suffering service problems and a loss of e-mail capability. In many instances, attempts to adequately prevent file exchange or e-mail mediated infections significantly inconvenience computer users. Improved strategies for detecting and dealing with virus attacks are desired.
One conventional approach to detecting viruses is signature scanning. Signature scanning systems use sample code patterns extracted from the known malware code and scan for the occurrence of these patterns in other program codes. A primary limitation of the signature scanning method is that only the known malicious code is detected, that is, only the code that matches the stored sample signatures of known malicious code is identified as being infected. All viruses or a malicious code not previously identified, and all viruses or a malicious code created after the last update to the signature database will not be detected.
In addition, the signature analysis technique fails to identify the presence of a virus if the signature is not aligned in the code in the expected fashion. Alternatively, the authors of a virus may obscure the identity of the virus by an opcode substitution or by inserting dummy or random code into virus functions. A nonsense code that alters the signature of the virus to a sufficient extent as to be undetectable by a signature scanning program without diminishing the ability of the virus to propagate and deliver its payload.
Another virus detection strategy is integrity checking. Integrity checking systems extract a code sample from the known, benign application program code. The code sample is stored together with the information from the program file, such as the executable program header and the file length, as well as the date and the time stamp of the sample. The program file is checked at regular intervals against this database to ensure that the program file has not been modified.
Integrity checking programs generate long lists of modified files when a user upgrades the operating system of the computer or installs or upgrades the application software. The main disadvantage of an integrity check-based virus detection system is that a great many warnings of virus activity issue whenever any modification of an application program is performed. It becomes difficult for a user to determine when a warning may represent a legitimate attack on the computer system.
Checksum monitoring systems (and generally, control sum or hash monitoring systems) detect viruses by generating a cyclic redundancy check (CRC) value for each program file. Modification of the program file is detected by a difference in the CRC value. Checksum monitors improve integrity check systems by the fact that the malicious code can hardly defeat the monitoring. On the other hand, checksum monitors exhibit the same limitations as integrity checking systems issuing too many false warnings, and to identify which warnings represent actual viruses or infection gets difficult.
An effective conventional approach uses the so-called white lists, i.e. the lists of known “clean” software components, links, libraries and other clean objects. In order to compare a suspect object against the white list, hash values can be used. The use of hashes is disclosed, for example, in WO/2007066333 where the white list consists of hashes of known clean applications. In WO/2007066333, checksums are calculated and compared against the known checksums.
To be effective, the white lists have to be constantly updated as disclosed, for example, in US 2008/0168558, which uses ISP for white list updates. In the US 2008/0104186, the white list is updated using some information derived from the content of a message. Also, in US 2007/0083757, it is determined whether a white list needs to be corrected and the last version of a white list is retrieved if correction is required.
When white lists are used, some false positive determinations are inevitably made. The false positives must be detected, as they can cause perhaps almost as much harm as a malware. For example, a legitimate component can be “recognized” by the AV software to be malware, causing severe damage to the reputation of the AV software vendor, and annoyance and wasted time for many users. Another scenario occurs when a malware is misconsidered to be a “clean” component which harms the system. Currently, false positives are detected and the white lists are corrected manually. It takes a relatively long time, often many hours, and sometimes as long as a day or two, since the process is to a large degree manual, requiring an analyst's participation, which does not prevent from occurrences of the same false positive for many users, before the white lists are updated and then distributed.
Detection of false positives is disclosed in US 2008/0168558, where false positives are detected by comparison of various threat reports. A security system which takes into consideration the values of false positives is disclosed in WO/03077071.
However, conventional systems do not provide an effective and robust update of the white lists based on detected false positives. For example, in US 2006/0206935, minimization of risk false positives is discussed, but how to correct the white lists is not suggested.
In WO/9909507, neural networks are used for minimization of false positives. However, this reference does not cover correction of the white lists. In US2007/0220043, parameters such as vendor, product version and product name are used for estimation of a potential threat. However, these parameters are not used for correction of a white list. Other conventional systems use software license information for including the software into a white list.
In other systems digital signatures are used for placing an object into a white or into a black (i.e., malware) list. For example, in the WO/2007143394, a digital signature is included in the white list. However, correction and update of the white list is not disclosed either.
It is apparent that improved techniques for maintaining, correcting and updating the white lists and the black lists are desired. Accordingly, there is a need in the art for a system and method that addresses the need for detection and minimization of false positives occurring during anti-malware processing.