1. Field of the Invention
This invention pertains in general to detecting viruses within files in digital computers and more particularly to detecting the presence of a virus in a file having multiple entry points.
2. Background of the Invention
Simple computer viruses work by copying exact duplicates of themselves to each executable program file they infect. When an infected program executes, the simple virus gains control of the computer and attempts to infect other files. If the virus locates a target executable file for infection, it copies itself byte-for-byte to the target executable file. Because this type of virus replicates an identical copy of itself each time it infects a new file, the simple virus can be easily detected by searching in files for a specific string of bytes (i.e. a “signature”) that has been extracted from the virus.
Encrypted viruses comprise a decryption routine (also known as a decryption loop) and an encrypted viral body. When a program file infected with an encrypted virus executes, the decryption routine gains control of the computer and decrypts the encrypted viral body. The decryption routine then transfers control to the decrypted viral body, which is capable of spreading the virus. The virus is spread by copying the identical decryption routine and the encrypted viral body to the target executable file. Although the viral body is encrypted and thus hidden from view, these viruses can be detected by searching for a signature from the unchanging decryption routine.
Polymorphic encrypted viruses (“polymorphic viruses”) comprise a decryption routine and an encrypted viral body which includes a static viral body and a machine-code generator often referred to as a “mutation engine.” The operation of a polymorphic virus is similar to the operation of an encrypted virus, except that the polymorphic virus generates a new decryption routine each time it infects a file. Many polymorphic viruses use decryption routines that are functionally the same for all infected files, but have different sequences of instructions.
These multifarious mutations allow each decryption routine to have a different signature. Therefore, polymorphic viruses cannot be detected by simply searching for a signature from a decryption routine. Instead, antivirus software uses emulator-based antivirus technology, also known as Generic Decryption (GD) technology, to detect the virus. The GD scanner works by loading the program into a software-based CPU emulator which acts as a simulated virtual computer. The program is allowed to execute freely within this virtual computer. If the program does in fact contain a polymorphic virus, the decryption routine is allowed to decrypt the viral body. The GD scanner can then detect the virus by searching through the virtual memory of the virtual computer for a signature from the decrypted viral body.
Metamorphic viruses are not encrypted but vary the instructions in the viral body with each infection of a host file. Accordingly, metamorphic viruses often cannot be detected with a string search because they do not have static strings.
Regardless of whether the virus is simple, encrypted, polymorphic, or metamorphic, the virus typically infects an executable file by attaching or altering code at or near an “entry point” of the file. An “entry point” is an instruction or instructions in the file that a virus can modify to gain control of the computer system on which the file is being executed. Many executable files have a “main entry point” containing instructions that are always executed when the program is invoked. Accordingly, a virus seizes control of the program by manipulating program instructions at the main entry point to call the virus instead of the program. The virus then infects other files on the computer system.
When infecting a file, the virus typically stores the viral body at the main entry point, at the end of the program file, or at some other convenient location in the file. When the virus completes execution, it calls the original program instructions that were altered by the virus.
In order to detect the presence of a virus, antivirus software typically scans the code near the main entry point, and other places where the viral body is likely to reside, for strings matching signatures held in a viral signature database. In addition, the antivirus software emulates the code near the main entry point in an effort to decrypt any encrypted viral bodies. Since viruses usually infect only the main entry point, the antivirus software can scan and emulate a file relatively quickly. When new viruses are detected, the antivirus software can be updated by adding the new viral signatures to the viral signature database.
More recently, however, viruses have been introduced that infect entry points other than the main entry point. As a result, the number of potential entry points for a viral infection in a typical search space, such as a MICROSOFT WINDOWS portable executable (PE) file, is very large. Prior art antivirus software would require an extremely long processing time to scan and/or emulate the code surrounding all of the entry points in the file that might be infected by a virus.
Moreover, the multiple entry points provide opportunities for viruses to use previously unknown methods to infect a file. As a result, it may not be possible to detect the virus merely by adding a new signature to the viral signature database. In many cases, the virus detection system itself must be updated with hand-coded virus detection routines in order to detect the new viruses. Writing custom detection routines and updating the antivirus software-requires a considerable amount of work, especially when the antivirus software is distributed to a mass market.
Therefore, there is a need in the art for antivirus software that can detect viruses in PE and other files having multiple entry points without requiring a prohibitively large amount of processing time. There is also a need that the antivirus software be easily upgradeable, so that new virus detection capabilities can be added without requiring hand-coded virus detection logic or needing to distribute a new virus detection engine.