The present invention relates to the field of computer viruses. More specifically, the present invention is directed to a system and method for detecting document-infecting viruses using dynamic heuristics.
Computer viruses that infect documents, such as word-processing documents, spreadsheets, slide presentations or other office documents that have discrete program objects (e.g., macros or other programs) attached thereto or embedded therein are a significant problem for computer users. While current virus detection and repair technology satisfactorily detects documents infected by known viruses, new (as yet unknown) viruses appear every day. Thus, methods are needed to automatically detect these new viruses that have never been seen before rather than waiting for the virus to be identified and incorporated into a database of known viruses. Methods presently exist to detect and repair new program-infecting and boot-infecting viruses, but many of these methods cannot be applied to document-infecting viruses. Furthermore, the subset of these methods that can be applied to document-infecting viruses falls short of providing an adequate solution.
Some existing anti-virus programs use static heuristics to detect and repair infected documents. Static heuristics use only the current state of a document in detection and disinfection. While static heuristics are effective in some cases, they are prone to false positives, false negatives and incorrect repair. Using dynamic information about changes to documents, rather than merely the current state of the document, can help reduce these problems.
Dynamic heuristics are rules that use descriptions of changes to a system to estimate the probability that the system is in a certain state. Some existing anti-virus programs use dynamic heuristics to detect and repair files infected by machine-language viruses that infect existing programs. Because they assume that an infected program is an essentially homogenous stream of bytes (some viral and some representing the original program) and because they operate on only a single program at a time, these anti-virus programs cannot be applied to document-infecting viruses. Heuristics for program-infecting viruses assume that the original object is itself a program, and that any reversible change that occurs in that program is likely to be viral. In the case of document-infecting viruses, the infected object is a data file which, initially, may not contain any programs at all, and the fact that new content has been added to a document is not itself strong evidence of infection.
Therefore, new methods are needed to detect and disinfect these document-infecting viruses using dynamic heuristics.
The present invention uses dynamic heuristics to detect and repair documents that are infected by computer viruses. Specifically, the present invention includes a system and method for detecting document-infecting computer viruses in a computer system having a plurality of documents, the method including the steps of maintaining a database of information pertaining to program objects associated with one or more of the documents, comparing one or more of the documents on the system with corresponding database entries in the database to detect certain document changes, and using a set of criteria to determine whether or not the detected document changes are likely to have been caused by viral activity. Preferably, the program objects include macros or programs.
The maintaining step may include the step of maintaining a database of information about whether or not each of the documents contains any program objects, the step of maintaining a database of information about each of the documents including the name of each program object contained in each of the documents, the step of maintaining a database of information about each of the documents including a number of the program objects contained in each of the documents, the step of maintaining a database of information about each of the documents including a combined total length of the program objects contained in each of the documents, the step of maintaining a database of information about each of the documents including a length of each program object contained in each of the documents, the step of maintaining a database of information about each of the documents including a CRC or other checksum of each program object contained in each of the documents, or the step of maintaining actual content of each program object located in each document.
The information pertaining to each program object preferably includes a transformation of its actual content that is likely to be invariant or insensitive to typical types of polymorphism.
In another embodiment, the information pertaining to each program object is selected to reflect a basic operation of the program object and to ignore details that are likely to change in the basic operation. Preferably, the ignored details include comments, formatting and/or identifiers. The information selected to reflect a basic operation of the program object can also be a program dependency graph.
The comparing step may include the step of periodically comparing, at some timed interval, one or more of the documents on the system with corresponding database entries to determine what changes have occurred since a last examination. It may alternatively include the step of comparing, in response to an event, to determine what changes have occurred in the recent past. The event may include receipt of a user input, every occurrence of a document changing, every Nth occurrence of a document changing, for some value of N, or an Nth occurrence of any document changing, for some value of N. Finally, the comparing step may include the step of comparing each of the one or more documents with corresponding database entries to detect added program objects.
The using step may include the step of determining one or more sets of program objects which are suspected of being viral. This may be accomplished by finding a positive maximum set of program object names which have been added to a number of documents.
The method preferably includes the additional steps of warning a user of the document changes; and receiving a message from the user indicating action to be taken.
The set of criteria may include whether more than a predetermined minimum number of documents have changed from having no program objects to having some program objects or whether more than a predetermined minimum number of documents have had the same number of units of active content added to them.
The maintaining step may include the step of maintaining a database of information about each of the documents sufficient to determine which program objects are new, and wherein the set of criteria includes whether more than a predetermined minimum number of documents have had new units of active content with the same name added to them.
The method of the present invention preferably includes the additional step of restoring, for certain types of document changes, the changed documents to their original condition or to a functionally equivalent state. The restoring step preferably includes the step of removing all program objects from documents that previously contained none, the step of removing all program objects that were not previously present, according to the information stored in the database, and/or the step of removing from each changed document any and all program objects that are suspected of being viral, in response to the using step.
The method of the present invention preferably includes the additional step of updating the database where any of the one or more documents does not have a corresponding database entry or the additional step of recording, in the database, any new program object which has been added to any of the documents.
The method of the present invention preferably further includes the step of recording, in the database, only those changes to documents which are judged not to be possibly viral in nature, so that future executions of the method will detect the other changes again and again consider whether or not they represent viral changes, in the light of other changes that have occurred since. It also can include the step of recording, in an additional database, information about recent program changes and utilizing, in the using step, the information in addition to any changes detected in a current run.
Another aspect of the present invention is a program storage device, readable by a machine, tangibly embodying a program of instructions executable by the machine to perform the method steps for detecting document-infecting computer viruses in a computer system as described hereinabove.
Finally, another aspect of the present invention is a system for detecting document-infecting computer viruses in a computer system having a plurality of documents, the system including a device for maintaining a database of information pertaining to program objects associated with one or more of the documents, a device for comparing one or more of the documents on the system with corresponding database entries in the database to detect certain document changes, and a device for using a set of criteria to determine whether or not the detected document changes are likely to have been caused by viral activity.