1. Field of the Invention
The present invention generally relates to diagnosing software errors in a computer system. More particularly, the present invention relates to instrumenting compiled software to simplify locating the origin of software errors.
2. Background and Relevant Art
Software errors are responsible for a variety of computer problems ranging in severity from inaccurate or unexpected program behavior and possible program termination to operating system corruption that halts the operating system. From a user's perspective a software error may represent a relatively minor inconvenience that can be addressed by occasionally retrying an operation, restarting an application, or rebooting the computer, or may lead to much more serious problems such as data corruption or loss and unstable computer operation. Of course, minor inconveniences are better tolerated by users, but software developers generally devote a significant amount of time and resources to testing software so that users will have a positive experience when using the software.
Because computer programs and drivers generally contain a significant amount of computer code, locating the source of an error often is the most difficult task in correcting the error. One reason for the difficulty in locating errors is that errors generally are caused in one place, but surface in another. Depending on the nature of an error, the software that caused the error and the software that detects the error may be completely unrelated. As a result, to locate the source of the error, investigation initially focuses on where the error is detected, and then expands to other areas.
While software is in the development stage, the software often includes extensive error checking information and instructions to help in both reproducing and locating the source of software errors. However, much of this error checking information and instructions generally is removed from production versions of software because often much of the information is only useful in a development environment and because it may have a negative impact on resource consumption and software performance. Nevertheless, most production software includes some level of error checking so that information regarding the nature of an error can be reported to the user and/or otherwise recorded for future reference. Usually this information serves at least two purposes: it helps identify the conditions which lead to the problem so that a user can take some form of remedial action, such as identifying a workaround for the problem, and the information may help developers in reproducing and locating the source of the problem so that the error can be corrected.
Software error checking information and instructions typically include instructions to call error reporting and/or logging routines and information that identifies the nature of the error for use in beginning the search to locate and identify the cause of the error. Accordingly, to the extent that the error checking information and instructions are better able to narrow the area to search for an error, it becomes easier to locate and identify the cause of the error. As illustrated in FIGS. 1 and 3, however, some developers may specify ambiguous error checking information, which provides little if any guidance in locating where a software error occurs so that the cause can be identified and corrected. While source code that specifies ambiguous error checking information can be corrected relatively easily at development time, once software has been compiled and is released to end-users, the ambiguities remain unless resolved in future releases.
For example, FIG. 1 illustrates the difficulty in locating a common type of software error related to memory allocations. As an application or driver needs dynamic amounts of memory, the software makes memory allocation requests. In the case of a driver, these allocation requests may be processed by the operating system. Once allocated, the software accesses the allocated memory through a pointer of some sort. Errors often occur, however, when the software writes data beyond the boundaries of the allocated memory, neglects to deallocate the memory, attempts to access the memory through an invalid pointer, such as after the memory has been deallocated or after the memory pointer has been corrupted, and the like.
Because memory allocations frequently lead to software errors, routines for memory allocation may require a tag to be specified so that if a memory related software error occurs, the tag may be used to help locate the source of the error. For example, if memory becomes corrupted, it is likely that the memory was overwritten. The typical culprit for overwriting memory is the software that writes to the memory that immediately precedes the corrupted memory. The tag is intended to help identify that software.
However, as indicated in FIG. 1, driver or application A 100, driver or application B 105, and driver or application C 110, all use tag X 115 when allocating tagged memory locations 120, 121, 122, and 123. In fact, driver or application A 100 uses tag X 115 in tagged memory allocations 120 and 121. Accordingly, if tagged memory allocation 122 were overwritten, tag X would provide no help in locating the software that may have been responsible, since all software uses the same tag value.
FIG. 3 presents an analogous problem for stop codes that halt the operating system. To help locate and identify the cause of an operating system halt, various standard stop codes (SSCs) are defined, labeled SSC 1 320 through SSC X 340. As shown, however, driver or application A 300 uses SSC 1 320 and SSC 3 325, driver or application B 305 uses SSC 5 330, and driver or application C 310 uses SSC 7 335. As a result, when the operating system is halted with a standard stop code that has been used in multiple places, such as SSC 1 320, SSC 3 325, SSC 5 330, SSC 7 325, it may not be possible to determine whether the error was detected in the software that duplicates the use of the standard stop codes or in the operating system itself. Note that the problem illustrated in FIG. 3 is particularly onerous since the duplicate use of standard stop codes in drivers or applications undermines the efforts of the operating system developers who defined standard stop codes to help locate and identify potential errors within the operating system.
Accordingly, there exists a need for methods, systems, and computer program products that instrument compiled software to include diagnostic information so that the origin of calls to compiled routines may be more easily identified and errors within the compiled software may be more easily located.