Intrusion detection systems (IDS) are systems that try to detect and alert on attempted or successful intrusions into an information system or network, where the intrusion is considered to be any unauthorized or unwanted activity on that system or network.
Buffer overflows (BOFs) in user input dependent buffers have become one of the biggest security hazards on the Internet and to modern computing in general. This is because such an error can easily be made at programming level, and while invisible for the user who does not understand or cannot acquire the source code, many of those errors are easy to exploit.
The principle of exploiting a buffer overflow is to overwrite parts of memory, which aren't supposed to be overwritten by arbitrary input, and making the process execute this code. To see how and where an overflow takes place, lets take a look at how memory is organized.
In the following one type of a BOF attack is explained in more detail. It should be appreciated that this is only an example and the details of the attack may vary from one attack to another. For example, the pointer may be some other pointer than a pointer to stack address space. (Below a term stack pointer is used for referring to a pointer to stack address space.) Memory addresses can be physical addresses of the memory or virtual/logical addresses used by processes running in a computer. A page is a part of memory that uses its own relative addressing, meaning the kernel allocates initial memory for the process, which it can then access without having to know where the memory is physically located in RAM. The processes' memory consists of three sections: The address space of processes is divided into at least three regions: Code, Data, and Stack. Data in the code segment are machine instructions that the processor executes. The code execution is non-linear, it can skip code, jump, and call functions on certain conditions. Therefore, we have a pointer called EIP, or instruction pointer. The address where EIP points to always contains the code that will be executed next. The data region is a memory space for variables and dynamic buffers. Static variables are stored in this region. The stack is a contiguous block of memory containing data and possibly also some executable code. The bottom of the stack is at a fixed address and fixed size of memory is allocated for the stack. How much of this memory space is used, is dynamically adjusted by the kernel at run time. Depending on the implementation the stack will either grow down (towards lower memory addresses), or up. The stack has the property that the last object placed on the stack will be the first object removed. This property is commonly referred to as last in, first out queue, or a LIFO. The stack consists of logical stack frames that are pushed when calling a function and popped when returning. A stack frame contains the parameters to a function, its local variables, and the data necessary to recover the previous stack frame, including the value of the instruction pointer at the time of the function call address after the stack. The CPU implements instructions to PUSH onto and POP off of the stack. PUSH adds an element at the top of the stack. POP, in contrast, reduces the stack size by one by removing the last element at the top of the stack. A register called the stack pointer (SP) points to the top of the stack. The stack pointer (SP) is also implementation dependent. It may point to the last address on the stack, or to the next free available address after the stack.
A buffer is simply a contiguous block of computer memory that holds multiple instances of the same data type. A buffer overflow is the result of stuffing more data into a buffer than it can handle, i.e. to overflow is to flow, or fill over the top, brims, or bounds. Buffer overflow (bof) is also a term used for a programming error that enables miss-use of a program in such a way that program overwrites some data in the memory. Buffer overflow attack (bof-attack) is a way to exploit bof-weakness in program to execute arbitrary code or to alter the control flow in a malicious manner in a target system where the code is running.
Shell code is simply machine instructions, which we write on the stack and then change the return address to return to the stack. Using this method, a code can be introduced into a vulnerable process and then execute it right on the stack. The return address can be changed to point to a shell code put on the stack by, for example, adding some NOP (no operation) instructions before the stack. As a result it is not necessary to be 100% correct regarding the prediction of the exact start of our shell code in memory (or even brute forcing it). The function will return onto the stack somewhere before our shell code and work its way through the NOPs to our new return address and run our code on the stack.
General fingerprinting is used in most of intrusion systems that use fingerprinting as one of their intrusion detection methods. It is usually a simple pattern matching which searches for some piece of publicly known attack (piece of shell code or whole attack code). There are some weaknesses in these prior art IDS methods. Firstly, because it is possible to write bof-attack (shell code- and nop-part of bof-attack) for exploitable service in many ways (bof-attacks do not resemble each other), general fingerprinting is not sufficient method for detecting bof-attacks. Secondly, it is a heavy operation to process huge amount of data to be analysed because of the large size of the fingerprint. In the prior art methods a bit sequence in data packet has been compared to a high number of long fingerprint patterns. On the other hand, due to the long fingerprint patterns, there have typically not been needs for comparing the fingerprints to “many points” in the bit stream in the data packet. However, since the number of the different fingerprints to be checked is very high (equal to different known attack types), much processing capacity is used. So there is need for a better technique to find both publicly known and yet unknown bof-attacks in a TCP stream.