This disclosure relates to operation of digital computers and, in particular, to the creation, collection and use of information associated with unhandled exceptions caused by one or more “bugs” in an executable program.
Unfortunately, computer programs “crash” all too often, usually for reasons that are not apparent. Despite extensive testing during development, beta testing, bug reporting procedures and the like, the reality is that even relatively mature software often contains “bugs”—the popular term for a flaw in the program code. Bugs are constantly being discovered, reported, collected, analyzed and in many cases fixed in a subsequent release, update or patch to the code. Still, in many application programs, especially complicated programs such as word processors, bugs remain that in some situations can cause a program to “crash”—the vernacular term for an unhandled exception. In other words, these are situations where an exception has occurred during execution of a program, and there is no exception handler code registered to deal with the exception. The result is that the program simply stops executing—it has crashed.
The typical response to a mysterious program crash (remonstrations aside), is to restart the failed program, and attempt to recover the user's data, sometimes by means of a backup file. Restarting the program, however, necessarily changes the state of the computer such that information about the state when the crash occurred is lost. That information might have been useful in identifying the bug that caused the crash.
Indeed, it is known in prior art to capture machine state information for use in debugging a program, or otherwise attempting to determine the cause of a crash. When a program stops executing (abnormally), this fact can be recognized and used to trigger a capture of the machine state. The current contents of RAM, processor registers, stack contents, etc. can be stored for later analysis. Sometimes an experienced, skilled artisan can study this information, like a detective at a crime scene, and discern something about the cause of the mishap. It is a difficult and labor intensive undertaking.
In the event of another crash of the same program, perhaps at another time or on another computer, there is no convenient way to determine whether the second crash might have the same or a similar etiology as the first. The detailed state of the second machine (or the same machine at the time of the second crash) will likely be quite different from the state at the time of the first crash. Only, another painstaking, detailed study of the machine state might reveal some association with the first crash. At the other extreme, it would be easy, for example, to record the program's instruction pointer value at the time of a crash, and then compare that value at the time of the second crash to see if the address is the same. This method would fail if a program's instructions were loaded at different addresses, and it is utterly context-insensitive. Even though it identifies the location where the program was executing instructions, it is unaffected by how the program got there.
Most methods concentrate on describing the crash in ways that could be meaningful to diagnose it, but ignore the usefulness of concisely characterizing it so that crash events can be categorized, collated, and studied statistically. This is not intended to aid in diagnosis per se, but it certainly could help manage such tasks. If a particular “bug” could be distinguished as being widespread, for example, it might warrant more attention than another. For this purpose, a precise description of the actual failure is not necessarily desirable. Commonly, programs crash because they tried to access a memory address that did not exist, an attribute too vague to aid categorization. Such crashes often occur within faultless code operating on defective data, so that even the actual location of the failure could be misleading.
What is needed is a way to identify or characterize a program crash, or more specifically, the current state of a computer thread at the time of a crash, that is easy to determine and recognize. It would be especially useful to have a way to describe that state so that a programmer could programmatically recognize a meaningfully similar state, whether it occurs on the same computer, a different computer, or even within a completely different program. A method that could generically characterize the instantaneous state of a thread at any arbitrary time would be more than adequate for this need, since it could therefore characterize the state of any crashed thread. It could, further, characterize the state of other non-crashed threads in the same program or other programs on the same machine, if desired for further study.
Thus, the present inventors have recognized a need to digest the call and/or data stack of an arbitrary program thread in order to generate a signature that characterizes its present state, sensitive to the path by which it reached its present state. The execution stack signature that this procedure generates is not required to concretely describe any aspect of the thread or its stack; rather the signature provides a moniker or alias that is predictably-similar when it describes threads in similar states and predictably-different when it describes threads in states.