There are two basic ways to clean up a computer virus infection: erase the infected file and replace it with a backup, uninfected version; or disinfect it. The second option is usually preferred by users because it is easier, and does not rely on the prior existence of a backup copy of the program.
The following description of how viruses typically infect host programs helps to explain why and how it is usually possible to disinfect infected programs. Unlike biological viruses, which typically destroy their host cells, computer viruses have a vested interest in preserving the function of their host programs. Any computer virus that causes its host to malfunction would be likely to arouse a user's suspicion and thus bring about its own untimely demise. By far the easiest way for a virus author to ensure this, and the only way used in practice, is to keep the original code intact and add the virus code to it. More specifically, it is almost universal to have the virus code execute first, then pass control back to the victim program. (Because the victim code might terminate in a variety of places under a variety of conditions, it is more difficult to design a virus that runs after the victim runs, and we know of no cases where this has been done.) For this reason, an infected program usually contains the entire contents of the original file in some form. Almost universally, the infected program contains large contiguous blocks of code from the original host (perhaps with some rearrangement of the original order), interspersed with blocks of virus code. Some pieces of the original host may not appear explicitly, but be encrypted and stored in data regions of the virus.
Several typical virus infection patterns are illustrated in FIG. 1. Appending viruses (FIG. 1a) add themselves to the end of the host, and modify the header of the original host so as to cause execution to begin within the virus rather than the host. A jump instruction at the end of the virus returns control to the host when the virus has finished execution. Prepending viruses (FIG. 1b) add themselves to the beginning of the host. Standard overwriting viruses (FIG. 1c) overwrite a portion of the host, and modify the header of the host so as to cause execution to begin inside the virus. Unless the virus happens to write itself into an unused portion of the host (such as an unused data region), the host is likely to suffer permanent, irreversible damage. Modified overwriting viruses such as those illustrated in FIGS. 1d and 1e copy a region of bytes equal in length to the virus to the end of the host, and then overwrite the beginning of the host. An additional virus region may intervene between the two sections of the original host. The examples of virus attachment patterns presented in FIG. 1 are illustrative, but not exhaustive.
Another important observation is that almost all viruses intersperse host and virus code very consistently, independent of the host, the operating environment, the virus's generation, etc.
Given these characteristics of typical viral infections, it is apparent that, in order to disinfect an infected program, one simply needs to know the locations of the pieces of the original host and how they ought to be joined to form the original. Additionally, in cases where portions of the host are imbedded or encrypted in the virus, it is necessary to reconstruct those bytes--either by knowing how the virus transforms and imbeds them, or by retrieving them from a database created prior to the infection of the host.
There are two known strategies for disinfecting infected programs. The most common technique is to detect the presence of a known virus in a program, and then to use specific knowledge about how that virus modifies its victims to undo that transformation. Typically, both the means for detecting and removing a given known virus are derived by a human expert; recent technological advances have made substantial progress towards automating both of these procedures.
A second strategy, referred to herein as "generic disinfection", is to record a small amount of information about each host program in a database, and to use this information to reconstruct the original program. The advantage of generic disinfection is that this method of disinfection does not rely on specific knowledge about a large number of viruses. This is particularly important For handling new viruses that have not yet been analyzed, or for which anti-virus updates have not yet been issued or installed widely by users. The disadvantage is that programs cannot be disinfected until the database has been established, i.e., viruses that were present in the system prior to the construction of the database cannot be removed by this method. Virus-specific and generic disinfection methods can be combined so as to benefit from the advantages of each.
A few implementations of generic disinfection exist, but all suffer from a lack of generality both in terms of the class of viruses they can remove and their specificity to one particular operating system. Previous schemes for generic disinfection consist of an ad hoc collection of methods, each tailored to a specific pattern of a virus's attachment to its host; some even fail to check whether the repair has been successful, and can result in damage to the host file.
This invention is a more general method for generically disinfecting host programs, applicable to nearly all existing viruses that preserve the function of their host. In some cases, it is even able to disinfect hosts that have had large portions overwritten by a virus-a situation that was previously regarded as completely hopeless. Furthermore, except for the choice of the value of a few parameters, the method does not depend in any fundamental way on the details of the operating system. Unlike some existing methods, the invention is extremely unlikely to perform an erroneous disinfection: it will either disinfect an infected host correctly, or leave it untouched.