In May of 2002, the number of known 32-bit Windows viruses and worms exceeded 2000. Computer viruses, worms, and Trojan horses are types of “malicious code,” which is herein defined as any computer program, module, set of modules, or code that enters a computer system without an authorized user's knowledge and/or without an authorized user's consent. In particular, a computer worm is malicious code that has the ability to replicate itself from one computer to another, e.g., over a computer network. Although the major problem for today's users is computer worms, simple file infector viruses persist and remain a challenge for anti-virus software. Distinguished from worms, computer viruses are characterized in that they typically attach to and/or modify a host program, which may be an executable, a dynamically linked library (DLL), or another kind of application. Examples of such viruses include the virus families W32. FunLove, W32. Bolzano, and W32. Coke.
To operate, malicious code usually has to call system APIs (application programming interfaces) from system libraries, such as KERNEL32. DLL in 32-bit WINDOWS systems. Normal computer programs access system APIs by static linking, which builds a proper import address table, or by calling the GetProcAddress( ) API to obtain dynamically the address of a particular API. An application that uses the GetProcAddress( ) API typically needs to have a static import to call that API. The static import for the GetProcAddress( ) API is located in the application's import address table, which thereby allows the application to call by name any APIs that are to be exported. Malicious code cannot typically call APIs like normal applications because malicious code typically does not have its own imports.
Accordingly, calling system APIs can be a challenge for writers of malicious code. For example, the Win95/Boza virus often failed to infect files as intended because it used hard-coded addresses to call particular system APIs more easily. Because API addresses often vary from one operating system (OS) to the next—and even among different releases of the same OS—malicious code that uses hard-coded API addresses does not spread well on different systems. To solve this problem, writers of malicious code have implemented functions for locating system APIs that are similar in their implementation to GetProcAddress( ). For example, the Win95/Marburg and Win32/Cabanas.A viruses both used a trick that has become a new standard in virus development. These viruses have a function that locates the address of each API they need to call under all Win32 systems. While a challenge for virus writers, this need to call system APIs can be exploited by writers of anti-virus tools.
The article, Ször, Péter, “Attacks on Win32,” Proceedings of the Virus Bulletin Conference, October 1998, England, explains how heuristic analysis is helpful for detecting infections by malicious code. Heuristics are usually based on certain characteristics that distinguish malicious code from normal code, such as the malicious code's need to call system APIs as described above. One heuristic suggested in the “Attacks on Win32” article is to search for code in a target file that uses the KERNEL32 address directly and looks for the text string “PE00”. This heuristic is premised on the idea that accessing the KERNEL32 library and searching for “PE00” is one way that malicious code can locate system APIs without having those addresses hard-coded, whereas normal applications do not need to perform this task because they have an import table. This technique is called a static heuristic because it is performed by scanning a particular file for certain structure characteristics common to particular types of malicious code. Static heuristic techniques have been extremely successful against early file viruses and other malicious code, and these heuristics continue to be useful against many species of malicious code today.
But programmers of malicious code are increasingly employing anti-heuristic techniques to hide the identifying features of malicious code infection in a file's structure. Many of these anti-heuristic techniques implement the same suspicious activity in different ways to obscure that activity from static heuristics techniques. For example, the static heuristic explained above could be defeated by encrypting the portion of the malicious code that looks for the text string “PE00”. Ször, Péter, “Attacks on Win32—Part II,” Proceedings of the Virus Bulletin Conference, September 2000, England, describes a variety of other anti-heuristic techniques used by malicious code to avoid detection by static heuristics. Accordingly, anti-virus software tools often employ emulation, in which a file is executed in an emulated environment. In this way, the anti-virus software can use heuristics that are based on actions that the executed file takes in the simulation, thus defeating the efforts of writers of malicious code to obscure the structure of their malicious code. Such techniques are referred to as dynamic heuristics. “Attacks on Win32 II” also describes various attacks by malicious code on computer systems, as well as static and dynamic heuristics and other techniques for detecting malicious code.
Apart from heuristics, the “Attacks on Win32 II” article additionally explains some of the incompatibility problems that caused older 32-bit viruses to fail to work on newer WINDOWS 2000 systems. For example, several 32-bit viruses call system APIs by searching for the loaded KERNEL32. DLL (and thus the APIs therein), and looking for the text “MZ” or “PE” within particular process address spaces. Because different versions and releases of WINDOWS use different base addresses for system libraries such as KERNEL32. DLL, viruses written for one release or type of OS will often fail to work on a different one. Viruses such as the WIN32/Cabana family do not pay attention to the moving DLL base address, and thus fail on some systems when attempting to locate the loaded system libraries. This paper, however, failed to recognize that incompatibility problems such as this one could themselves be used as a heuristic for detecting various types of malicious code.
As writers continue to develop malicious code, the need persists for additional reliable heuristics to detect computer viruses and other malicious code without generating a significant number of false positives.