1. Field of the Invention
This invention relates generally to computer software, and more particularly to computer software debugging via simulated re-execution.
2. Description of the Related Art
When programming computers, there are often errors, at least initially, in a computer program such that the program produces unexpected results. These errors are often referred to as “bugs.” It is through the process of debugging that these bugs are eliminated. In some cases, in order to recognize and correct a bug, a programmer simply needs to observe the unexpected results. The programmer then studies the source code to determine what might have caused the unexpected behavior, and corrects (re-codes) the appropriate parts of the program.
However, often, the events leading up to observing a “failure” are complex in nature and not fully “told” by observing the failure. For example, if the computer crashes every time a user selected “Print” in an application, a programmer may be able to examine the code relating to printing and deduce the coding error. On the other hand, if the computer crashes at seemingly random points, a programmer would start by asking for more details surrounding a crash. It is not uncommon for a “crashed” program to generate a type of core (memory) dump and/or log of events. As can be appreciated, debugging a program becomes very difficult when a bug occurs intermittently. That is, when an unexpected behavior occurs at random times that cannot predictably be reproduced.
A traditional method of debugging involves the use of a trace buffer, where events performed by the computer during a certain window of time are captured. The programmer sets up a trigger to freeze the window when the bug or situation of interest occurs. Trace buffers generally are included as part of an emulator, which can physically replace a target computer's central processing unit (CPU). The emulator performs the same operations as the original CPU, but adds specialized debugging abilities, such as trace buffers, single step, examine memory, etc. Trace buffers typically capture bus cycles (access) and are presented as CPU instructions and/or data (variable) access, which can be correlated with the originating source.
However, complex bugs can involve, for example, a mistake A, which leads to another mistake B, which leads to another mistake C, and then finally results in some behavior D, which was originally observed as the bug. The process of understanding the observed bug is one of coming to understand that the behavior (D) was caused by an unexpected behavior C, which was the result a unexpected behavior B, which itself was the result of unexpected behavior A for which the programmer sees an explanation in an error in coding the program. Generally, trace buffers hold a small snapshot (window) of time. As a result, the process of debugging involves triggering on the first observed bug D, examining the trace buffer to realize this was the result of a prior unexpected behavior, and re-triggering and regenerating the bug to capture events surrounding the unexpected behavior C, and so on.
With memory costs coming down, it is possible to have very large trace buffers that store hundreds of thousands of events. In fact, the embodiments of present invention envision trace buffers that hold billions of events. Unfortunately, with such a large amount of storage capacity, it is extremely difficult for someone to look through so much information. Providing a “find” operation, similar to searching a text document for the occurrence of a particular word, is useful but limited. The problem is that events occur within a context, which is to say, the state of the system—the current image of RAM that holds all the variables.
In debugging, when one finds some variable is incorrectly set, it is often important to know states of other variables. Further, complex data structures involving many interconnected variables are difficult, to the point of being impractical, to look at by simply looking at the state of specific variables. For example, the hair color of the last ten people a person met, if such was important to know, might be found by consulting a log of the individuals recently met. From their names, one could find their social security number, from this one could get their driver's license number, and from these records a hair color for each could be retrieved (assuming all this data is in memory). Using an emulator to examine memory and ask “what is the value of variable A?” as a technique to solve the prior problem is so difficult that the goal becomes unobtainable.
Returning to a simpler bug, one that is easily repeatable, a common debugging process is to write a “print” program (or portion of code) that does a lot of work running around through various data structures so one can easily see “the hair colors of the last ten people a person met.” The program is then run and re-run, over and over again, each time inserting a call to this “print” program, and other prints statements as appropriate, to progressively isolate and understand the origin of the series of unexpected events that eventually lead to the bug being observed. Unfortunately, this technique cannot always be utilized to address intermittent program bugs.
In view of the forgoing, there is a need for a method that provides improved debugging of intermittent programming bugs. The method should make all bugs essentially repeatable and allow debugging to occur on the repeatable bugs.