1. Field of the Invention
The present invention relates to a virtual machine system and, more particularly, relates to a virtual machine system appropriate for automated code analysis and capable of analyzing data including executable programs presented to a computer system.
2. Discussion of the Related Art
Detection of malicious code including programs such as viruses has been a concern throughout the era of the personal computer. With the growth of communication networks such as the Internet and increasing interchange of data, including the rapid growth in the use of e-mail for communications, the infection of computers through communications or file exchange is an increasingly significant consideration. Infections take various forms, but are typically related to computer viruses, trojan programs, or other forms of malicious code. Recent incidents of e-mail mediated virus attacks have been dramatic both for the speed of propagation and for the extent of damage, with Internet service providers (ISPs) and companies suffering service problems and a loss of e-mail capability. In many instances, attempts to adequately prevent file exchange or e-mail mediated infections significantly inconvenience computer users. Improved strategies for detecting and dealing with virus attacks are desired.
One conventional technique for detecting viruses is signature scanning. Signature scanning systems use sample code patterns extracted from known malicious code and scan for the occurrence of these patterns in other program code. In some cases program code that is scanned is first decrypted through emulation, and the resulting code is scanned for signatures or function signatures. A primary limitation of this signature scanning method is that only known malicious code is detected, that is, only code that matches the stored sample signatures of known malicious code is identified as being infected. All viruses or malicious code not previously identified and all viruses or malicious code created after the last update to the signature database will not be detected. Thus, newly created viruses are not detected by this method; neither are viruses with code in which the signature, previously extracted and contained in the signature database, has been overwritten.
In addition, the signature analysis technique fails to identify the presence of a virus if the signature is not aligned in the code in the expected fashion. Alternately, the authors of a virus may obscure the identity of the virus by opcode substitution or by inserting dummy or random code into virus functions. Nonsense code can be inserted that alters the signature of the virus to a sufficient extent as to be undetectable by a signature scanning program, without diminishing the ability of the virus to propagate and deliver its payload.
Another virus detection strategy is integrity checking. Integrity checking systems extract a code sample from known, benign application program code. The code sample is stored, together with information from the program file such as the executable program header and the file length, as well as the creation date and creation time for the program file. The program file is checked at regular intervals against this database to ensure that the program file has not been modified. A main disadvantage of an integrity check based virus detection system is that a great many warnings of virus activity issue when any modification of an application program is performed. For example, integrity checking programs generate long lists of modified files when a user upgrades the operating system of the computer or installs or upgrades application software. It is difficult for a user to determine when a warning represents a legitimate attack on the computer system.
Checksum monitoring systems detect viruses by generating a cyclic redundancy check (CRC) value for each program file. Modification of the program file is detected by a variation in the CRC value. Checksum monitors improve on integrity check systems in that it is more difficult for malicious code to defeat the monitoring. On the other hand, checksum monitors exhibit the same limitations as integrity checking systems in that many false warnings issue and it is difficult to identify which warnings represent actual viruses or infection.
Behavior interception systems detect virus activity by interacting with the operating system of the target computer and monitoring for potentially malicious behavior. When such malicious behavior is detected, the action is blocked and the user is informed that a potentially dangerous action is about to take place. The potentially malicious code can be allowed to perform this action by the user. This makes the behavior interception system somewhat unreliable, because the effectiveness of the system depends on user input. In addition, resident behavior interception systems are sometimes detected and disabled by malicious code.
Another conventional strategy for detecting infections is the use of bait files. This strategy is typically used in combination with other virus detection strategies to detect an existing and active infection. This means that the malicious code is presently running on the target computer and is modifying files. The virus is detected when the bait file is modified. Many viruses are aware of bait files and do not modify files that are either too small, obviously bait files because of their structure or have a predetermined content in the file name.
It is apparent that improved techniques for detecting viruses and other malicious types of code are desirable.
Aspects of the present invention utilize certain characteristics of virtual machine technology. The concept of a “virtual machine” is known in the art and virtual machines have found various uses. The merits of the “virtual machine” include the ability to execute code that would not execute on the hardware platform under other circumstances, such as code intended for other hardware platforms. Other applications of virtual machine technology can be found in multi-user and multi-processing systems, where each process runs within its own virtual machine.
Virtual machines have been applied to various computer functions, such as in the interface between computer hardware and high level languages (HLL) (U.S. Pat. No. 5,872,978 to Hoskins), the networking of real machines to form a parallel processor (U.S. Pat. No. 5,774,727 to Walsh et al.) and to create a multi-tasking or multi-user computer environment (U.S. Pat. No. 4,400,769, to Kaneda et al.). Virtual machines have also been applied where cross-platform HLL code portability is required (U.S. Pat. No. 6,118,940 to Alexander, III et al).