The invention is generally related to computers and computer software. More specifically, the invention is generally related to debugging multi-threaded software.
Locating, analyzing, and correcting suspected faults, called xe2x80x9cbugsxe2x80x9d, in a computer program is a process known as xe2x80x9cdebugging.xe2x80x9d Typically, a programmer uses a tool commonly known as a xe2x80x9cdebuggerxe2x80x9d to debug a program under development.
Conventional debugging tools typically support two primary operations to assist a computer programmer. A first operation supported by conventional debuggers is a xe2x80x9cstepxe2x80x9d function, which permits a programmer to process instructions (also known as xe2x80x9cstatementsxe2x80x9d) in a computer program one-by-one and to see the results upon completion of each instruction. While the step function provides a programmer with a large amount of information about a program during its execution, stepping through hundreds or thousands of program instructions can be extremely tedious and time consuming, and might require a programmer to step through many error-free instructions before a set of suspicious instructions to be analyzed are executed.
To address this difficulty, a second operation supported by conventional debuggers is a break-point function, which permits a programmer to identify with a break-point the precise instruction at which the programmer desires to halt execution of the program. As a result, the debugger executes the program in a normal fashion until a break-point is reached, and then the debugger stops execution and displays the results of the program to the programmer for analysis.
Typically, step operations and break-points are used together to simplify the debugging process. Specifically, the programmer sets a break-point at the beginning of a desired set of instructions to be analyzed and then begins executing the program. Once the break-point is reached, the debugger halts the program, and the programmer then steps through the desired set of instructions line-by-line using the step function in the debugger. Consequently, a programmer is able to quickly isolate and analyze a particular set of instructions without needing to step through irrelevant portions of a computer program
The break-point and step operations work well when the entire program being debugged runs in a single thread. But, some operating systems allow multiple parts, or threads, of one or more programs to run simultaneously. These operating systems are referred to as multi-threaded, which is a type of parallel processing that allows for more straightforward software design and faster execution of such programs on multi-processor computers.
Debugging multi-threaded computer programs is difficult since timing (i.e., synchronization) problems between the threads can occur, which can be very difficult to reproduce. For example, the debugging environment might interfere with the synchronization of the threads. Many reasons for this interference with the synchronization can be present. For example, the computer system upon which the computer program is being debugged might have different performance characteristics than the intended, ultimate host for the computer program. Also, the debugging environment includes additional monitoring and control over program execution, which can impede the intended execution of threads.
Not only do debuggers introduce synchronization problems, but the programmer might be interested in evaluating specific synchronization situations that could possibly occur. But, the programmer does not want to alter the design of the program to force a specific, synchronization situation; instead, the programmer wants a temporary synchronization debugging method. For example, the programmer might want to evaluate how the computer program would behave when multiple threads of interest are utilizing sections of code simultaneously.
Another reason why thread synchronization during debugging might be necessary is when an incomplete computer program is being debugged. The amount of code in a computer program can be enormous, so programmers often combine a top-down design technique with a bottom-up testing technique because it would be impractical to wait for the entire computer program to be available in order to test it. Using the top-down design approach, each subfunction within the program is defined, including the interfaces to other subfunctions. A programmer would then write the lower-tier sections of the computer program, such as modules. Using the bottom-up testing approach, as these lower-tier sections are completed, larger amounts of the computer program are debugged. Sections or modules of a computer program that are not yet complete are merely represented with xe2x80x9cstubsxe2x80x9d in order to exercise the defined interfaces.
Unfortunately, synchronization situations expected in the full computer program can be difficult to replicate in the incomplete computer program because such stubs would not perform like the full computer program. For example, a stub routine called by the computer program might only return a constant value whereas the real routine, when later available, would perform a number of calculations and processing steps that would delay such a response. The programmer would like to synchronize execution of a thread executing this stub routine to be more like the expected performance, but does not want to waste time developing elaborate code for the stub that would ultimately be discarded.
One technique for debugging synchronization problems is for a user to set a break-point and allow a first thread to hit it. The user then manually suspends the first thread and commands the program to continue executing until a second thread hits the break-point. The user then releases the first thread allowing both threads to execute. But, this manual approach has several problems. First, it is inconvenient for the user. Second, even having manually created the synchronized thread condition of interest, the suspected fault may not manifest itself until many repetitions. Third, the number or identity of threads suspected of causing the fault could be numerous, making this manual synchronization impractical. Finally, deadlock can occur if the second thread is waiting for the first thread, which is in a suspended state, so the user must manually release the first thread in order to free the deadlock problem.
Thus, a significant need exists for controlling the synchronization of threads, so that multi-threaded applications can be debugged more easily.
The invention is a way to synchronize threads in a multi-threaded program. In the preferred embodiment, a debugger provides a break-point that does not interrupt the user when the first thread reaches it; instead, the debugger halts this thread at the break-point and waits for other threads to accumulate at the break-point before the debugger notifies the user. The user can specify a condition under which this notification should occur; for example, when a specific thread or a certain number of threads have accumulated at the break-point. Once the condition is satisfied, the debugger suspends other threads that have not reached the break-point. The debugger then provides for synchronized stepping or running of the threads that are halted at the break-point.