With the advent of general access computer networks, such as the Internet, people may now easily exchange application programs and application data between computer systems. Unfortunately, some people have taken advantage of such easy data exchange by developing computer “viruses” designed to spread among and sometimes attack interconnected devices, such as networked computers. A virus is application code that executes on one's computer without one's knowledge, and against one's interests. Viruses tend to replicate themselves within all interconnected devices, allowing an exponential “infection” of other devices.
In response to the security threat intrinsic to viruses, anti-virus programs were developed to identify and remove viruses. Anti-virus programs use virus scanners to scan individual files, groups of files, or the entire hard drive or drives of a computer system for known viruses in a number of ways, such as by comparing each file to a list of “virus signatures” that are stored in “virus signature files” or by emulating computer instructions contained within the file to evaluate the effect of the instructions. The scanning can be done upon request of a user, when the file is accessed on a mass storage device such as by an application, or on a scheduled basis. Virus scanning is, therefore, a resource intensive (both in CPU and disk I/O usage) and time-consuming task, especially in the case of access scanning. Oftentimes, a user's file-open request must be delayed until the file can be scanned and possibly cleaned. This resource consumption can lead to a degradation of a computer's overall performance and slower response times for users.
In light of the foregoing, it is not surprising that a universal problem in anti-virus scanning is the length of time required to scan a file for viruses. This problem has been exacerbated in recent years for a number of reasons. One reason is simply that file sizes are growing ever larger, thereby increasing the average amount of time to scan a given file. Oftentimes, the files are so large that they are compressed and stored as archives. The term archive as used herein includes traditional archive data formats such as ZIP, ZOO, LHA, ARC, JAR, LZW, etc. that contain compressed collections of data files, in addition to other data formats that may embed other files, e.g., Microsoft Word (e.g., “.DOC”) documents, Rich Text Format (RTF) files, Object Linking and Embedding (OLE) containers, etc. Scanning archives and documents containing embedded objects takes additional time and resources to scan. In some cases, the virus developers manufacture “malicious” files which purposefully take a long time to scan, because they themselves are large files such as archives and documents containing embedded objects.
In addition to growing file sizes, another reason why anti-virus scanning technology is taking longer is the growing number of drives in a typical computer system as well as the growth in the size of the drive's storage medium itself. Adding to the problem is the explosive growth in the number of viruses for which the file, groups of files, or drives must be scanned.
One way to reduce the amount of time and computer resources required by anti-virus scanning is to prevent the needless rescanning of a previously scanned file. A number of techniques to prevent rescanning have been implemented in anti-virus scanning technology; however, none of them have adequately addressed the problem. These techniques share the concept of saving a set of parameters, an AV “state,” for the file as of the last scan so that once a file has been scanned and found free of infection, it should not need to be scanned again unless the file is modified. The parameters chosen for the AV state are indicative of virus infection if changed, such as the file's length, checksum, and date of last file write operation.
For example, a common approach is to verify the integrity of a file by using a cyclical redundancy check (CRC) which generates a unique number when applied to a given file, often referred to as a checksum. The checksum remains unchanged as long as the file is unchanged. However, generating and verifying the checksum for each file is time-consuming and often uses more resources than simply performing the anti-virus scan itself. Indeed, the cost of generating a checksum for every file on every scan prohibits implementation given the constraints on the processing power and speed of today's computer systems.
Another approach is to maintain a database of file date, time, and size information, and compare the files to be scanned against the database to determine if there is a difference which indicates that the file should be scanned. This approach has proved easy to circumvent, however, by redirecting the database updates to another file.
What is needed, therefore, is an improved method of preventing the needless rescanning of previously scanned files. The above-mentioned shortcomings, disadvantages and problems are addressed by the present invention, which will be understood by reading and studying the following specification.