The replication of a computer virus is an important first step before an analysis of the virus can begin. If the replication is correctly performed it provides the researcher with: (a) proof that a suspected virus is indeed a virus, and (b) samples of the virus, on which the researcher can base his or her analysis. For example, samples of the virus may be used to extract a "signature" of the virus that remains invariant over all instances of the virus.
Over the past several years a number of different classes of computer viruses have been recognized. A class of computer viruses of most interest to this invention is a class referred to as macro viruses, so named because they are intended to execute in an application's macro language. This is in contrast with binary file and bootsector viruses, which instead run in machine code directly under the machine's operating system.
Computer macro viruses require their target application to be loaded and active to function. These viruses are restricted in their functionality insomuch as the application restricts the operation of macros. In practice, the macro languages, the intended target of macro viruses, are hardly restrictive in the actions they allow macros to execute.
When a document or other data sample that is suspected of harboring a computer macro virus, hereinafter referred to simply as a `virus` unless misinterpretation is deemed possible, is received in a computer virus laboratory for analysis, it is important to produce as many and as diverse samples as possible from the received suspected sample. Traditionally, this is done manually by loading the sample in its appropriate application and exercising the sample until it produces infected files (additional copies of itself). If this fails, the alleged virus is pronounced non-viral. If there is doubt about the analysis, the suspected virus can dissected by the researcher to determine by hand the nature of the virus.
If the suspected virus does replicate, it is important that enough samples be generated, and that the generated samples are of a divergent nature. This can be accomplished by various labor intensive processes. To achieve diversity, the researcher can repeat the replication process on various versions of the target application, including foreign language versions. This is an important consideration, as different versions of the same application program can produce different binary representations of the same virus. Likewise, the virus may take on different characteristics on different versions of the application, while still maintaining its viral nature.
Although it is typically not necessary to produce more than, for example, six samples per application version, in some cases more must be produced. In particular, if a large degree of divergence is observed between the text of the macros, it is necessary to produce an order of samples more to insure correct analysis and testing in later stages of the analysis.
One problem with conventional approaches is that the virus analysis can be a lengthy, labor intensive process if done correctly. If shortcuts are taken the number of samples generated may be insufficiently diverse, or just too few in number, which deficiencies will negatively affect the ensuing analysis.
A polymorphic macro virus is one that is capable of "mutation", that is, changing the virus code either during or after its execution in order to make it more difficult to compare with an original version of the virus.
Reference may be had to the following commonly assigned U.S. Patents for teaching various computer virus (not necessarily macro virus) detection, removal and notification techniques: U.S. Pat. No.: 5,440,723, issued Aug. 8, 1995, entitled "Automatic Immune System for Computers and Computer Networks", by Arnold et al.; U.S. Pat. No.: 5,452,442, issued Sep. 19, 1995, entitled "Methods and Apparatus for Evaluating and Extracting Signatures of Computer Viruses and Other Undesirable Software Entities", by Kephart; U.S. Pat. No.: 5,485,575, issued Jan. 16, 1995, entitled "Automatic Analysis of a Computer Virus Structure and Means of Attachment to its Hosts", by Chess et al.; U.S. Pat. No.: 5,572,590, issued Nov. 5, 1996, entitled "Discrimination of Malicious Changes to Digital Information Using Multiple Signatures", by Chess; and U.S. Pat. No.: 5,613,002, issued Mar. 18,1997, entitled "Generic Disinfection of Programs Infected with a Computer Virus", by Kephart et al. The disclosures of these commonly assigned U.S. Patents are incorporated by reference herein in their entireties, in so far as the disclosures do not conflict with the teachings of this invention.