Recent history has seen a proliferation of shellcode attacks against computers. These attacks seek to exploit a relatively recently discovered vulnerability in many modern computers. This susceptibility, often called buffer overflow vulnerability, provides a previously unknown backdoor of sorts through which a malicious program can be inserted and executed. In essence, certain fields within the memory stacks of modern computers are designed to accept, or at least expect to typically receive, only American Standard Code for Information Interchange (ASCII) or other information strings, and not executable code. The buffer overflow vulnerability, then, is simply a reference to the fact that malicious executable code cleverly designed to look like an ASCII string can be placed into such a field. If this code is subsequently executed, it will often wreak significant havoc on the target computer. Examples of such susceptible fields are buffers, or allocated memory spaces, intended to hold character strings such as usernames, passwords, login names, and the like.
Fields such as these typically serve as a general mechanism for passing parameters to subroutines or functions when they are invoked. They also can provide temporary memory storage for any other variables such subroutines may require during execution, and are typically allocated using a memory stack or heap structure. The vulnerability in this configuration rests in the fact that the instruction pointer register, holding the return address to the calling function, is saved sequentially within the same memory area, coupled with the fact that certain subroutines have implemented the ability to write strings to the buffers without length-checking. In other words, prior to writing an input string to a buffer, some subroutines do not check the length of the string against the allocated size of the buffer. When it gets written then, a sufficiently long string will simply overflow the buffer and write information into its corresponding instruction pointer register. If the string is carefully designed, it can place a memory address into the instruction register, directing the computer to skip to that address and look for instructions to execute.
Herein lies the problem. Cleverly-designed programs have been written as strings containing ASCII characters that double as executable code, often called shellcode. These strings also place an address in the instruction register designed such that the address points back into the buffer. The target computer is thus directed to run this executable code instead of jumping to instructions for its normal processing task.
This shellcode gets its name from one of the most common programs for exploiting this vulnerability. The program is designed to spawn an instruction shell on the target computer, hence the name shellcode. These instruction shells, typically simplified programming environments that allow users to manipulate files or execute other system-level commands, are then used in a number of detrimental ways. For example, instructions can be issued to carry out such tasks as deleting files, changing passwords, sending information such as access codes to the hacker's computer, or even downloading a different virus program and executing it.
One of the best methods of avoiding shellcode attacks, or attempts to place shellcode into a target computer such that it will be executed, is to detect the shellcode and remove it before its execution. It would thus be desirable to provide a method and apparatus for detecting shellcode.
Unfortunately, shellcodes can be made difficult to detect. Shellcode can be written in many different ways, utilizing different commands and methods, so as to avoid any single characteristic set of instructions. It would thus also be highly desirable to provide a method for detecting shellcode that scans for the existence of a plurality of such characteristic instruction sets.
Finally, even though the exact form of a shellcode can be disguised, it must carry out at least a certain set of operations in order to perform its appointed task. For example, shellcodes must always execute certain tasks, such as manipulating information by transferring arguments from one place to another, regardless of the exact commands, instruction language, and the like that are used. It would thus be further desirable to provide a function-based method for detecting shellcode that examines the operations or tasks that a string is written to execute, rather than simply searching for specific instructions.