With the advent of general access computer networks, such as the Internet, people may now easily exchange application programs and application data between computer systems. Unfortunately, some people have taken advantage of such easy data exchange by developing computer “viruses” designed to spread among and sometimes attack interconnected devices, such as networked computers. A virus is application code that executes on one's computer without one's knowledge, and against one's interests. Viruses tend to replicate themselves within all interconnected devices, allowing an exponential “infection” of other devices.
In response to the security threat intrinsic to viruses, anti-virus programs were developed to identify and remove viruses. Anti-virus programs periodically check a computer system for known viruses, or application code that appears to perform undesired activities, such as reformatting a hard disk. Virus scanners may be invoked on-demand by a computer user to scan a selected file. More typically, virus scanners install themselves as part of an operating system, and then scan files, according to user preferences, as the files are created and accessed. This type of virus scanner is referred to as an on-access virus scanner.
Some on-access virus scanners attach themselves to communication input and/or output pathways to inspect data that might not be easily identifiable to an operating system's file based scanning. For example, an e-mail scanner may be attached to a communication port, such as an e-mail transfer port so as to allow scanning of incoming and outgoing e-mails and their attachments. E-mail is a common way for a virus to enter into a system otherwise protected by an operating system based on-access scanner, as the e-mail program may receive and store an infected e-mail message without providing an opportunity to the operating system scanner to scan the e-mail. For example, an infected e-mail may be received and stored in a database such that there is no individual data, or recognizable data, available for scanning. Thus, an e-mail scanner is used to scan e-mails, and their attachments, as they are received (or sent) by a system.
A problem in using on-access virus scanners is that the scanned file, e-mail or e-mail attachment can be any type of data. For example in the case of e-mail, in order to reduce the size of the data transferred, the attachments are frequently compressed and stored as archives. The term archive as used herein includes traditional archive data formats such as ZIP, ZOO, LHA, ARC, JAR, LZW, etc. that contain compressed collections of data files, in addition to other data formats that may embed other files, e.g., Microsoft Word (e.g., “.DOC”) documents, Rich Text Format (RTF) files, Object Linking and Embedding (OLE) containers, etc. Scanning archives and documents containing embedded objects takes additional time and resources to scan.
A prior art on-access virus scanner 100 is organized into two parts, as illustrated in FIG. 1. One part is the event filter 110, which is the software that intercepts the events of interest to the virus scanner. Events of interest include a file being opened or an e-mail arriving in a mailbox. Another part is a scanner thread 120, which is the software that receives scan requests from the filter. The scanner thread determines whether the object of the intercepted event (i.e. the file, e-mail, or e-mail attachment) needs scanning and, if so, scans the object. Multiple scanner threads are typically provided in pools 130 that are capable of executing concurrently so that multiple objects may be scanned simultaneously.
Unfortunately, virus developers have recently begun to manufacture “malicious” files which take “a long time” to scan, including archives and documents containing embedded objects. The malicious files are designed to overwhelm on-access virus scanners by tying up all of the available scanner threads in the pool, thereby causing all other events intercepted by the filter to be queued until a scanner thread becomes free. This causes the virus scanner to “crash” by blocking further processing of data and leaves a system undefended against subsequent attacks. If e-mail or file processing is routed through a virus scanner and the scanner has crashed, then a “denial of service” for e-mail or file activity occurs until the scanner is restarted.
In some instances, a computer user may inadvertently overwhelm a virus scanner by repeatedly accessing a file that is taking a long time to scan, which only increases the burden on the virus scanner. For example, if a particular ZIP file takes 10 minutes to scan and a user deliberately or accidentally accesses that file several times, then all the on-access virus scanner's scanner threads will be scanning the ZIP file. Since no other file accesses can be processed for 10 minutes, the user's computer is effectively unusable for that period of time.
Even if a particular file takes just 5 seconds to scan but is being accessed frequently, then the scanner thread pool may still be overwhelmed causing other file accesses to be delayed by several seconds. While this doesn't affect the availability of the computer as much as the ZIP file that takes 10 minutes to scan, the “user experience” is still poor, i.e. their applications are unresponsive. If the user clicks on a document and the system doesn't respond within a short enough time then the user may very well click again and again, which only makes the problem worse.
One remedy might be to simply increase the number of scanner threads in the pool. However, this may not be achievable given the practical limits on processor power. Moreover, an increase in the number of scanner threads is likely to be inadequate, particularly when the virus scanner is overwhelmed as the result of a malicious attack. What is needed, therefore, is an improved method of virus scanning that improves the efficiency with which scanner threads process scan requests.