1. Field of the Invention
The present invention relates, in general, to memory management and debugging tools, and, more particularly, to software, systems and methods for determining a scope of data corruption in a computer system memory and, in some cases, a potential cause of such corruption by examining a crash dump file or core file that was created in response to corrupted memory in the computer system.
2. Relevant Background
Computer system designers and analysts face the ongoing and often difficult task of determining how to fix or improve operation of a computer system that has experienced an unexpected exception or is failing to operate as designed (e.g., is experiencing errors caused by software problems or “bugs”). When a problem or bug in the computer system software is serious enough to stop or interrupt the execution of a running program, this failure is known as a crash. Often, the cause of the crash or panic can be linked to one or more data structures or chunks of memory. However, it is often difficult to determine what caused the corruption of the data structure, whether similar or related data structures have been corrupted, and in general, the scope or size of the data corruption in the computer system memory.
In computer systems, memory is an important resource that must be carefully managed and most computer systems include a memory manager as part of the operating system to keep track of which parts of memory are in use and which parts are not in use, to allocated memory to processes when they need it and to deallocate it when they are done, and to otherwise manage memory allocation and use. While some computers only provide access to memory to one process or program, most computer systems allow multiprogramming which requires the memory manager to properly partition memory to protect one portion of memory and stored data structures from access from other programs, to properly relocate memory such that programs loading data do not exceed allocated partition sizes and overwrite data in neighboring partitions, and to otherwise protect data in memory from damage or corruption during operation of the computer system. Memory allocation is not stagnant with the memory manager continually relocating program memory addresses, changing the sizes of partitions, and reallocating memory for new processes as the number and variety of programs changes. At any time in the operation of a computer system, the image of memory will therefore vary significantly which can make it more difficult to identify corrupt memory, scope of memory problems, and causes of memory corruption.
To assist in identifying bugs in the software operating on a computer system such as those that corrupt memory, software applications are often configured to write a copy of the memory image of the existing state of the application or kernel at the time of the crash or exception into a file. These memory image files are sometimes called core files or core dumps. The system-level commands or programs in the operating system, i.e., the kernel software, are of particular interest to system analysts in correcting bugs in a crashed computer system. For example, in UNIX®-based systems, the kernel is the program that contains the device drivers, the memory management routines, the scheduler, and system calls. Often, fixing bugs begins with analysis of these executables, which have their state stored in a kernel core file. Similarly, at the user level or in the user space, programs or binaries (e.g., binary, machine readable forms of programs that have been compiled or assembled) can have their state stored in user core files for later use in identifying the bugs causing the user applications to crash or run ineffectively.
In practice, a panic or other problem occurs in an operating computer system, such as when memory becomes corrupted. In response, the system operator transfers a copy of the core file or memory image of the computer system at the time of the panic to a system analyst (such as a third party technical support service) for debugging. However, debugging a program, application, or kernel based solely on the core file can be a very difficult and time-consuming task as many crashes are causes by corruption of memory in one form or another. In many system core dumps or core files, a debugger can relatively easily identify a single corrupted data structure (e.g., any structure that can hold information useful to operation of a computer system or program including memory addresses, values or variable information, data, and the like and may be large in size or down to one byte range) as a cause of the panic in the computer system. In contrast, it typically is not clear or obvious from the core file if other data or data structures in memory are also corrupted or what caused the corruption. Without knowing the scope of the corruption, it is a difficult and costly task for a debugger to determine the source or cause of the corruption as the debugger often cannot identify whether a pattern of corruption exists or whether the corruption began at a point in memory differing from the identified piece of memory that caused the panic.
Hence, there remains a need for improved methods and mechanisms for use in determining a scope, size, and/or cause of memory corruption in a computer system based on a crash dump file or a core file.