Exemplary embodiments generally relate to computers and computer software, and more particularly relate to debugging multi-threaded software.
Locating, analyzing, and correcting suspected faults, called “bugs”, in a computer program is a process known as “debugging.” Typically, a programmer uses a tool commonly known as a “debugger” to debug a program under development.
Conventional debugging tools typically support two primary operations to assist a computer programmer. A first operation supported by conventional debuggers is a “step” function, which permits a programmer to process instructions (also known as “statements”) in a computer program one-by-one and to see the results upon completion of each instruction. While the step function provides a programmer with a large amount of information about a program during its execution, stepping through hundreds or thousands of program instructions can be extremely tedious and time consuming, and might require a programmer to step through many error-free instructions before a set of suspicious instructions to be analyzed are executed.
To address this difficulty, a second operation supported by conventional debuggers is a break-point function, which permits a programmer to identify with a break-point the precise instruction at which the programmer desires to halt execution of the program. As a result, the debugger executes the program in a normal fashion until a break-point is reached, and then the debugger stops execution and displays the results of the program to the programmer for analysis.
Typically, step operations and break-points are used together to simplify the debugging process. Specifically, the programmer sets a break-point at the beginning of a desired set of instructions to be analyzed and then begins executing the program. Once the break-point is reached, the debugger halts the program, and the programmer then steps through the desired set of instructions line-by-line using the step function in the debugger. Consequently, a programmer is able to quickly isolate and analyze a particular set of instructions without needing to step through irrelevant portions of a computer program.
The break-point and step operations work well when the entire program being debugged runs in a single thread. But, some operating systems allow multiple parts, or threads, of one or more programs to run simultaneously. These operating systems are referred to as multi-threaded, which is a type of parallel processing that allows for more straightforward software design and faster execution of such programs on multi-processor computers.
Debugging multi-threaded computer programs is difficult since timing (i.e., synchronization) problems between the threads can occur, which can be very difficult to reproduce. For example, the debugging environment might interfere with the synchronization of the threads. Many reasons for this interference with the synchronization can be present. For example, the computer system upon which the computer program is being debugged might have different performance characteristics than the intended, ultimate host for the computer program. Also, the debugging environment includes additional monitoring and control over program execution, which can impede the intended execution of threads.
Not only do debuggers introduce synchronization problems, but the programmer might be interested in evaluating specific synchronization situations that could possibly occur. But, the programmer does not want to alter the design of the program to force a specific, synchronization situation; instead, the programmer wants a temporary synchronization debugging method. For example, the programmer might want to evaluate how the computer program would behave when multiple threads of interest are utilizing sections of code simultaneously.
Another reason why thread synchronization during debugging might be necessary is when an incomplete computer program is being debugged. The amount of code in a computer program can be enormous, so programmers often combine a top-down design technique with a bottom-up testing technique because it would be impractical to wait for the entire computer program to be available in order to test it. Using the top-down design approach, each subfunction within the program is defined, including the interfaces to other subfunctions. A programmer would then write the lower-tier sections of the computer program, such as modules. Using the bottom-up testing approach, as these lower-tier sections are completed, larger amounts of the computer program are debugged. Sections or modules of a computer program that are not yet complete are merely represented with “stubs” in order to exercise the defined interfaces.
Unfortunately, synchronization situations expected in the full computer program can be difficult to replicate in the incomplete computer program because such stubs would not perform like the full computer program. For example, a stub routine called by the computer program might only return a constant value whereas the real routine, when later available, would perform a number of calculations and processing steps that would delay such a response. The programmer would like to synchronize execution of a thread executing this stub routine to be more like the expected performance, but does not want to waste time developing elaborate code for the stub that would ultimately be discarded.
Moreover, priority inversion can cause deadlock in systems but since priority inversion is not easy to detect during testing phase, priority inversion can go unnoticed. When priority inversion problems are detected during runtime, however, addressing these problems requires a lot of effort and money and sometimes product recalls. A classic example of this case is the PathFinder rover sent by NASA to probe on Mars. The PathFinder rover had a priority inversion problem, which caused a deadlock. NASA had to debug the software using a remote connection, create a new image, and upload it. NASA was fortunate in this case because NASA did not remove debug facilities before launch.
Thus, a need exists for detecting priority inversion problems in multi-threaded software so that the priority of threads can be properly controlled.