When analyzing or debugging a computer program or system, it is often necessary to restore the system to a state that it was in at an earlier point in time, typically to examine that state or to change the subsequent behavior of the system. Two examples of diagnostic techniques that use this capability are backwards debugging and stateless search.
There are several well-known methods for representing and restoring system states. One method is to explicitly save entire states (or those parts of the state that are modified later in the execution). This makes restoration of the state relatively cheap, but each saved state requires substantial time to record and memory space to store. A second method is to record the steps of the execution and how to undo each, and to restore the system to a previous target state by undoing all intervening steps in reverse order. This makes restoration easy if the target state is not many steps from the current state. However, gathering and storing the required information about each step is relatively expensive, and restoring states from the distant past is inefficient. A third method is to represent a state by the computation needed to bring the program to that state. For example, a state of a deterministic sequential program can be represented by the initial program state and the number of steps that have been executed to arrive at that state; for a concurrent or nondeterministic program, representing a state also requires recording all of the scheduling choices or nondeterministic choices made along the way. The process of re-executing the program from an initial execution state to a desired target state is called replaying the execution.
The replay approach has several advantages. It requires gathering and storing only an initial state and relatively little information about the execution. However, a disadvantage of the replay approach is that replaying a long execution can take substantial time. It is therefore important to make the replay as efficient as possible. One partial mitigation is to periodically checkpoint the state, and to replay the execution from the latest checkpoint; in this case, more efficient replay allows these checkpoints to be taken less often.
The main challenge in efficient replay is stopping the execution precisely at the desired target step. The usual way to do this is to use preexisting facilities that allow the program to be interrupted when certain configurable conditions arise. For example, some CPUs allow program execution to be “single-stepped”, causing execution to “break” into a monitor after every step of the execution. Unfortunately, each break into the monitor is expensive in terms of execution time and so it is important to minimize the number of interrupts taken during replay. Thus, the simplest solution to the replay problem, single-stepping the program for an appropriate number of instructions, is usually too inefficient.
An alternative is to set more specific breakpoints, causing execution to break into the monitor on more specific conditions. For example, some CPUs provide hardware breakpoints that cause execution to break when a specified memory location or I/O port is used for a specified operation such as a read, write, or instruction fetch. A more efficient alternative to single stepping is to set a breakpoint either on the code of the target step or on data accessed by the target step. But, if the target step is in a tight loop, or if it manipulates frequently accessed data, the replay may still take many interrupts before arriving at the target step.