Computer programming errors, commonly known as “bugs,” can result in partial or total failure of computer programs. Some computer program failures may be intermittent, and therefore cannot be easily reproduced. This intermittent characteristic of certain program errors can frustrate efforts to resolve the error. It is desirable to be able to characterize the program error so that it can be fixed by the software engineering organizations that developed the computer program. One type of computer error experienced by Java application programs, for example, is an unresponsive (“program freeze”) or unstable user interface. In this class of program errors, the application user has limited recourse to determine the execution state of the program at the time of the failure because the program has become unresponsive.
Many machines, such as Java machines, employ an execution trace for each thread for localizing the cause of program errors. The execution trace provides a list of the current call stack for each thread. The current call stack is the list of methods that have not yet finished (which is why the term stack is used). As a method finishes, the entry representing the method is removed from the call stack and thus from the execution trace output. While the execution trace can serve to confirm that the program has indeed failed, the information contained in such traces is lost when the user terminates the program.
In conventional approaches to collecting information about failing application programs, a dedicated monitoring process may be launched to watch over the program executing on a server. Should the server fail, the monitor process can be accessed by a user to determine the state to the server at the time of the failure. In such conventional approaches, the machine supporting the program must perform information collection tasks after the program error has occurred, often when the machine is in an ambiguous state. A monitoring process will not be able to detect certain classes of errors (for instance when threads are waiting for a notification which will never occur). Further, these conventional approaches can incur additional delay that frustrates the user, who has just experienced the program error and now must endure the situation in which there is no reliable mechanism for resolving the difficulty.
Accordingly, none of the conventional approaches enables capturing an execution state of a failing program in the event that a program has incurred a program error causing the user to terminate the program.