A computer system (100, FIG. 1) as well as a computer-based apparatus (e.g. an industrial automation system) usually comprises one or more central processor units, a random access memory for storing data or control instructions, an I/O interface for interaction with user such as data or instruction inputs, and other related components. Such a computer system or a computer-based apparatus needs the support of operating system and application software in order to work. Typically, software products need to be tested and debugged before being brought into use so as to ensure the correctness of running results and compliance with design requirements. Therefore, debugging tools are needed.
A debugger (debugging program 110) is a software product used to start another software product (referred to as a “debuggee program” 120) and monitor the execution of the debuggee program (see FIG. 1). The debugger has a function called “single stepping” which can control the debuggee program to execute step by step (as differentiated from continuous execution). In addition, the debugger may also control the debuggee program to run continuously until it reaches a certain position predefined by user. This kind of control is implemented by setting breakpoints for debugging in the debuggee program. Regardless of breakpoint mode or single stepping mode, at each breakpoint for debugging or after each single step, the debugger will take over the control to perform debugging work.
Debugging work typically includes, but is not limited to:
1. checking values of variables, such as the contents of particular CPU registers in order to help the user analyze the reason of errors in the debuggee program;
2. suspending the execution of a certain part or whole of the program, and then passing control to a programmer through an interactive user interface;
3. running a user pre-defined routine; and
4. dumping the status of the debuggee program (generating a snapshot), that is, saving its running status at a certain moment into a certain external storage for later analysis.
Thus, the basic functionality of breakpoint mechanisms in debuggers (or other similar instrumentation tools) is to cause the generation of notifications or interruptions at desired points in a stream of executed instructions, where the points of interception are specified dynamically at run-time rather than pre-programmed at developing-time. The method by which the debugger deals with such breakpoints is called a “breakpoint handling mechanism”.
The breakpoint handling mechanism of the debugger will be described in detail below.
The most widely used breakpoint handling mechanism is the software breakpoint handling mechanism. This kind of mechanism is implemented completely by software and does not depend on any specific hardware mechanism. It has a long history, and dates back to the infancy of modern computers. In this kind of mechanism, the debugger replaces an instruction with a breakpoint instruction in the instruction stream of the debuggee program where the breakpoint for debugging is desired. When the program runs to this breakpoint instruction, it is “trapped” into the operating system 105 (that is, the operating system will assume control), and the operating system proceeds to pass the control to the debugger to perform specific debugging work. After the debugger completes the debugging work, the normal execution of the debuggee program needs to be restored. At this time, the debugger will restore the original instruction that has been replaced with the breakpoint instruction, and execute it (at this time, the breakpoint instruction is overwritten and no longer exists). Since the breakpoint instruction no longer exists at this time, the debugger needs to restore the breakpoint instruction before passing the control to the debuggee program, so that the breakpoint mechanism will be triggered correctly when the debuggee program runs to this position next time.
It should be noted that we use “replace” or “write” instead of “insert” for a breakpoint instruction in the above description, because binary instructions are highly dependent upon their resident locations in the computer. If a new instruction is “inserted” into a fragment of instruction stream, then all of the instructions following it would fail. So the only practical way for modifying a binary instruction stream is to replace part of the instructions therein.
FIG. 2 shows a conventional software breakpoint handling mechanism under a single-thread environment. The upper part of the figure illustrates the status of the instruction stream of the debuggee program at different times, from left to right illustrating the varying process of the instruction stream in time sequence.
Specifically, as shown in FIG. 2, at time t1, before being debugged, the instruction stream of the debuggee program is unmodified by the debugger (step 201). Next, at time t2, the user sets a breakpoint for debugging at for example the position of instruction “INSTR.3” of the debuggee program. In this case, the debugger replaces the instruction “INSTR.3” with a trap instruction in the instruction stream of the debuggee program (step 202). Further, after modifying the instruction stream as mentioned above, at step 203, the debugger starts the debuggee program.
After the debuggee program is started, as soon as it runs to the trap instruction, the operating system takes over control, and passes control to the debugger, thus entering the breakpoint handling mechanism of the debugger (step 204).
Specifically, in the breakpoint handling mechanism, first, at step 205, the current running status of the debuggee program is saved. At this step, the motive for saving the current running status is to enable the debugger to restore the previous status of the debuggee program before returning control to it when the debugger finishes the debugging operations; otherwise, the running environment of the debuggee program would be corrupted.
Next, at step 206, the debugging work is performed. As mentioned above, the debugging work may include checking values of variables; suspending the running of a certain part of or the whole program, or obtaining instructions from a programmer 130 through an interactive UI and executing them; running a user pre-defined routine; or generating a snapshot of the running status of the debuggee program.
At step 207, the original instruction of the debuggee program at the breakpoint for debugging is restored. That is, the trap instruction is replaced with the instruction “INSTR.3”. At this moment, the instruction sequence of the debuggee program is as shown at time t3 in FIG. 2. Specifically, once the debugging work is finished, the running of the debuggee program must be restored. However, the debuggee program cannot simply continue to run at this moment, since the next instruction following the trap instruction is “INSTR.4”, and the instruction “INSTR.3” has not been executed by now. As described above, binary computer instructions are highly dependent upon their resident locations, so the instruction “INSTR.3” cannot be executed simply from another location. Therefore, the debugger must restore the instruction “INSTR3”, that is, write the instruction back to its original position, so as to ensure that the debuggee program continues to run correctly.
Next, at step 208, the single stepping mechanism is enabled. Specifically, the reason for enabling the single stepping mechanism at this step is that, after the instruction “INSTR.3” is restored, the debugger cannot simply make the debuggee program continue to run. This is because after the instruction “INSTR.3” is restored, no breakpoint instruction exists in the instruction stream of the debuggee program Thus, when the debuggee program runs to the position of the instruction “INSTR.3” next time, no breakpoint for debugging will be met as before. Therefore, after the instruction “INSTR.3” is executed, the debugger must set the breakpoint for debugging back as soon as possible, that is, replace the instruction “INSTR.3” with the trap instruction again. Thus the debuggee program will execute only one instruction, i.e. the instruction “INSTR.3”, and then passes the control to the debugger. This is the single stepping mechanism mentioned above.
In addition, some architectures support “hardware single stepping”, and some not. If the system supports “hardware single stepping”, then the debugger enables the “hardware single step” mechanism, otherwise the debugger enables “software single step” mechanism provided by the operating system to trigger a “trapping” event after the debuggee program executes every single instruction.
At step 209, the debugger restores the status before the interruption of the debuggee program based on the running status saved at step 205, and passes the control to the debuggee program to continue its running.
As indicated at step 210, at time t3, the instruction “INSTR.3” is executed. Further, as indicated at step 211, since the single stepping mechanism is used, after execution of the instruction “INSTR.3” and before execution of the instruction “INSTR.4”, at time t4, another “trapping” event is triggered automatically. Thus, the debugger obtains the control again to enter into the breakpoint handling mechanism. At step 212, the debugger disables the single stepping mechanism so as to avoid unnecessary “trapping” event.
At step 213, the current status of the debuggee program is saved.
At step 214, the instruction “INSTR.3” is replaced with a trap instruction and the instruction stream at this moment is as shown at time t5 of FIG. 2. At step 215, the debugger restores the status of the debuggee program saved at step 213. At step 216, the debugger returns control to the debuggee program to continue its running. That is, as indicated at time t5 of FIG. 2, the debuggee program continues to execute other instructions beginning from the instruction “INSTR.4”.
Attention now turns to breakpoint handling solutions for multithreaded debuggee programs, as the above conventional software breakpoint handling mechanism cannot be used for a multithreaded debuggee program. This is because that during some points of this kind of mechanisms, there is no breakpoint instruction in the instruction stream, (e.g. time t3 and time t4 in FIG. 2). This time window is called the “dangerous window” (see, for example, Norman Ramsey, “Correctness of Trap-based Breakpoint Implementations”; Proceedings of the 21st ACM Symposium on the Principles of Programming Languages, January 1994). This window is long enough for a computer system whose process speed has been increased rapidly to make other threads running to this position miss the breakpoint for debugging. This is particularly serious on a multi-processor machine.
There are the following solutions at present for this kind of multithreaded debuggee programs.
The first method is to suspend other running threads before the debugger handles a breakpoint for debugging and resume them at full speed when the debugger finishes all breakpoint handling operations.
FIG. 3 is a flowchart illustrating the breakpoint handling method. As shown in FIG. 3, when a thread runs to a breakpoint for debugging or a “trap” instruction, the debugger first suspends all other threads before any further breakpoint processing operation (step 318). Further, after setting a breakpoint instruction again and before restoring the normal running of the debuggee program, the debugger resumes all the suspended threads (step 319). In addition, all the other steps 301-316 are identical to steps 201-216 in FIG. 2.
Although this method can implement the breakpoint processing of a multithread debuggee program, the overhead for suspending and resuming the other threads is tremendous, and increases as the increment of the number of threads and CPUs.
Method 2: Move the Original Instruction to a New Location:
In some situations, such as debugging the OS kernel or firmware, suspending the thread is usually impossible, or would be too slow to be tolerable. Both djprobe project homepage “http://sourceforge.net/project/showfiles.php?group id=41854” and kprobe project homepage “http://sourceware.org/systemtap/kprobes/” provide a debugging method for multithreaded debuggee which eliminates the “dangerous window” and requires no suspending of other threads. Specifically, in the second method, the original instruction of the debuggee program at the breakpoint for debugging is moved to another location for execution, and its meaning is kept unchanged.
FIG. 4 is a flowchart illustrating this breakpoint handling method. As shown in FIG. 4, after the debugger finishes debugging, the instruction “INSTR.3” is executed at a new location (step 420). The other steps 401-406 and 409 are identical to the steps 201-206 and 209 in FIG. 2.
However, as mentioned above, the meaning of an instruction is closely related to its location, so moving an instruction to a new location involves parsing of its meaning and reimplementing the instruction at the new location with exactly the same logical meaning of the original instruction. This parsing process can be very sophisticated. Further, being highly dependent upon specific hardware architectures, different parsers must be designed for respective all supported architectures.
Method 3: Hardware Breakpoint:
The third method is implemented by using the hardware breakpoint mechanism. Some processors include breakpoint registers, in which an address can be stored. An interrupt is triggered when the processor executes or accesses this address. The usage of the hardware breakpoint mechanism eliminates the need to modify the instruction sequence of the debuggee program.
However, the number of such breakpoint registers is quite limited, and may not be sufficient to meet the requirement in a real implementation. Moreover, commercial hardware architectures often do not support the hardware breakpoint mechanism at all. Therefore, the hardware breakpoint mechanism is usually used, when it is used at all, as a supplement to the software breakpoint mechanism.