1. Field of the Invention
The present invention generally relates to automatic search for one or more patterns which may be contained within a body of text or computer data in encrypted form and, more particularly, to searching for encrypted patterns within computer viruses.
2. Description of the Prior Art
Searching computer programs or data for one or more occurrences of a specified pattern or set of patterns is one of the most common computer applications, implemented for example by the Unix utility "grep". A pattern may be a simple string or something more complex, such as any sentence generated by a grammar.
One important application for searching is computer virus detection. If the searched data is found to contain a "virus signature" pattern, this is taken to be a strong indication that the data contains a virus corresponding to that signature.
In order to evade detection via simple signatures, some viruses employ simple encryption schemes. Such self-encrypting viruses have a relatively small "degarbling head" that remains unencrypted. When the virus is executed, control passes first to the degarbling head, which decrypts the body of the virus and then passes control to it. The body performs the main function of the virus, presumably including attaching a newly-encrypted copy of the virus to some new host. The appearance of such a virus varies with the encryption key even though the underlying computer code is always the same.
One approach to searching for such self-encrypting viruses is to avoid the encrypted regions, and to search for patterns in the unencrypted "degarbling head". This is feasible if the head is substantially constant from one instance of the virus to another, and contains byte sequences that permit the choice of a signature pattern that is sufficiently unusual to reduce the probability of discovering the pattern in legitimate software (the "false positive" probability) to an acceptably low level. However, some highly polymorphic viruses have highly variable degarbling heads, the variability resulting not from encryption, but from permutations of code fragments, random insertion of irrelevant instruction sequences that have no influence, and other techniques. This makes it difficult or impossible to select suitable signatures. Another drawback is that many different viruses may have the same decrypting head, and the use of a fixed signature would not be able to distinguish amongst them.
A second approach is to use the virus itself to perform its own decryption, and then to use any of a variety of standard string searching methods to search the resulting plaintext. This allows a signature to be chosen from an encrypted region of the file, greatly increasing the selection of potential signatures. Moreover, there are families of related computer viruses which share large portions of code. Since the shared code may come from encrypted regions of the viruses, detecting a virus on the basis of information contained within the encrypted region may enable many viruses to be detected with a single signature, which might not: be possible if detection were based solely on the degarbling head.
There are two basic methods in widespread use for persuading a virus to decrypt itself. The first applies to a situation in which the presence of the virus is discovered only after it has already been loaded into memory. In such a case, a virus stored in encrypted form decrypts itself when it is loaded into memory. Then standard string searching techniques can be used to search memory for the presence of plaintext signatures. Of course, this method is limited to cases where the virus has recently been active. It is not prophylactic: it cannot be used to screen as yet unexecuted, incoming software for the presence of viruses.
The second method for using a virus to perform its own decryption is to interpret it; that is, to simulate its execution in a virtual environment. One can interpret the loop that performs the decryption to produce a plaintext that can be searched for fixed signatures in the usual manner. Such a technique can be effective in detecting a large class of viruses. However, this technique has several drawbacks, including the practical difficulties of employing an interpreter.
The disadvantages of detecting encrypted viruses by either avoiding the encrypted regions or getting the virus to decrypt itself could be overcome if there were a method for searching the encrypted regions directly, without any need for decryption.