Presently, computer systems and the operation thereof are utilized in all facets of modern life. For example, computer systems are common both at home and in the workplace to increase output as well as provide user convenience. Accordingly, there is a need for many different types of software to operate on computer systems. However, with the mix of operating systems and the software operating thereon, the probability of technical errors and computer system crashes and/or failures are high.
Memory corruption problems are one of the major causes for computer system failure. When debugging kernel memory corruption problems, as stated herein, it is often determined that the corrupted buffer has been erroneously written to by a user of an adjacent buffer. By determining the specifics of the adjacent buffer an insight into the cause of the corruption may be gained. For example, with reference to FIGS. 1A and 1B, normally two buffers (e.g., 105 and 110 of FIG. 1A) operate without negative interaction (e.g., the data placed within a buffer is not more than the buffer can handle). However (with reference now to FIG. 1B), a buffer overrun 125 is one of the possible memory corruption problems that may cause a computer system to crash. In general, a buffer overrun 125 occurs when a first buffer (e.g., 105) overruns (e.g., 115) its allotted space and “treads” upon another buffer (e.g., 110). Later on, when the buffer which has been “trodden” upon (e.g., 110) is accessed; it may induce an error in the system. This error will often be the cause for a system crash. In some cases, as long as the situation remains unresolved each time the sequence of overrun and access occurs, the system will crash. In other cases, the buffer overrun may occur during an initial process such as startup. In that case, the system may not return to an operational state until a technician resolves the problem.
In many cases, in order to diagnose a buffer overrun a postmortem analysis of the crashed memory dump is required to determine or try to determine the identity of the subsystem that caused the overrun condition. For example, a technician will analyze the crashed memory dump (“crash dump”) and initially ascertain that indeed buffer 110 has been trodden upon. The next step may be to identify the buffer 105 which did the overrunning 115 and establish a root-cause. However, identification of an arbitrary buffer 105 or subsystem may be very difficult.
One method for identifying an arbitrary buffer or subsystem operating in a system is to track each and every allocation of every buffer in a system. For example, a program may be utilized to record the buffer being allocated and the subsystem allocating it. Therefore, when the identity of a specific buffer is needed, the technician may simply access the record of buffer allocations and instantly receive information on the subsystem that utilized the buffer and caused the overrun.
However, the utilization of a buffer recording program has deleterious effects on the system during normal operation. That is, although the recording of buffer allocation is extremely helpful during analysis of a crashed system, in a well operating system the effects are extremely detrimental. For example, a system operating with a buffer allocation recording program may be slowed by a factor of two. In that case, a user would sacrifice up to half of the operational capabilities of a system in order to ensure that a system crash involving buffer overrun could be easily resolved.
In order to resolve buffer overrun without the user having to endure a slowed system, a second method may be utilized to resolve the identity (type) of the arbitrary (unknown) buffer 105 involved in the initial overrun 115. That is, in order to determine the type of an arbitrary memory buffer, a technician may be forced to use debugger commands such as “::kgrep” (e.g., search memory for pointer) and “::whatis” (e.g., identify allocating kernel memory cache of a given pointer) in alternating succession. For example, once the arbitrary buffer 105 is found, a first routine (e.g., ::kgrep) will search through the kernel memory for any pointers (e.g., pointer 135) indicating the arbitrary buffer 105. When a pointer 135 is found, a second routine (e.g., ::whatis) will try to identify the buffer 130 (or the cache 130 that allocated the buffer 105) at the source of the pointer 135. When an object of known type is finally reached, types can be back propagated to determine the type of the unknown object. For example, if it is successful, then cache 130 (e.g., process from process cache, thread from thread cache, message block from message block cache, or the like) which allocated the arbitrary buffer 105 may be known and the system problem may then be resolved by focusing on the problem within the specified cache 130.
However, the problem with utilizing the initial buffer 105 as the starting point generally occurs after the two routines have been run three to four times. For example, if the second routine cannot identify the second buffer or cache 130 with a pointer 135 to the unknown buffer 105, then the first routine (e.g., ::kgrep) must be used to find a pointer pointing to the second buffer 130. Once that pointer is found, the second routine (e.g., ::whatis) will try to identify the buffer at the source of the pointer (e.g., the third buffer). This process can go on ad infinitum. However, as stated herein, after about the third or fourth level, the plurality of possible buffer types and pointers become overwhelming to manually process, and the process will stop or be to difficult to manually derive the solution. Due to the exponential increase in buffer type possibilities, the probability of resolving a manual solution becomes minute. Therefore, the process of buffer identification beginning at the stepped-on buffer and working backward is tedious, incomplete, and error-prone.