The invention is generally related to computers and computer software. More specifically, the invention is generally related to debugging multi-threaded software.
Locating, analyzing and correcting suspected faults in a computer program is a process known as xe2x80x9cdebugging.xe2x80x9d Typically, a programmer uses another computer program commonly known as a xe2x80x9cdebuggerxe2x80x9d to debug a program under development.
Conventional debuggers typically support two primary operations to assist a computer programmer. A first operation supported by conventional debuggers is a xe2x80x9cstepxe2x80x9d function, which permits a computer programmer to process instructions (also known as xe2x80x9cstatementsxe2x80x9d) in a computer program one-by-one, and see the results upon completion of each instruction. While the step operation provides a programmer with a large amount of information about a program during its execution, stepping through hundreds or thousands of program instructions can be extremely tedious and time consuming, and may require a programmer to step through many program instructions that are known to be error-free before a set of instructions to be analyzed are executed.
To address this difficulty, a second operation supported by conventional debuggers is a break point operation, which permits a computer programmer to identify with a xe2x80x9cbreak pointxe2x80x9d a precise instruction for which it is desired to halt execution of a computer program during execution. As a result, when a computer program is executed by a debugger, the program executes in a normal fashion until a break point is reached, and then stops execution and displays the results of the computer program to the programmer for analysis.
Typically, step operations and break points are used together to simplify the debugging process. Specifically, a common debugging operation is to set a break point at the beginning of a desired set of instructions to be analyzed, and then begin executing the program. Once the break point is reached, the program is halted, and the programmer then steps through the desired set of instructions line by line using the step operation. Consequently, a programmer is able to quickly isolate and analyze a particular set of instructions without having to step through irrelevant portions of a computer program.
Most break points supported by conventional debuggers are unconditional, meaning that once such a break point is reached, execution of the program is always halted. Some debuggers also support the use of conditional break points, which only halt execution of a program when a variable used by the program is set to a predetermined value at the time such a break point is reached.
Some operating systems, such as UNIX and Windows NT, allow multiple parts, or threads, of one or more processes to run simultaneously. These operating systems are referred to as multi-threaded. This type of parallel processing allows for faster execution of such processes on multi processor machines, and can simplify software development.
Synchronization of multi-threaded computer programs is typically provided by semaphores. A semaphore is a token that is used in a multi-threaded operating system to coordinate access, by competing threads, to xe2x80x9cprotectedxe2x80x9d resources or operations. This coordination may be used to limit the number of threads that can execute a piece of code at the same time. The typical limit is one, creating a mutually exclusive lock. Semaphores are also used to impose the order in which a series of interdependent operations are performed. Thus, a semaphore acts as a key that a thread must acquire to continue execution. Any thread that can identify a particular semaphore can attempt to acquire it by passing to the semaphore function a system-wide number that is assigned when the semaphore is created. The function does not return until the semaphore is actually acquired. Alternatively, a semaphore may specify a time limit after which the semaphore is released.
The term semaphore generally refers to a function used to halt execution of a section of code until a condition is satisfied releasing the semaphore function. The term mutex is often used interchangeably, although generally a mutex is a broader concept, encompassing the data structures used to track the semaphore function, the evaluation of the condition holding completion of the semaphore function, and scheduling the release of threads queued by the semaphore function.
Debugging multi-threaded computer programs is difficult since timing (i.e., synchronization) problems occur and sometimes may be very difficult to reproduce. For instance, the debugging environment may interfere with the synchronization in the design of the computer program, such as the semaphores present. Many reasons for this interference with the synchronization may be present, such as the computer system upon which the computer program is being debugged is different in performance from the intended ultimate host for the computer program. Also, the debugging environment includes additional monitoring and control over program execution that may necessarily impede the intended execution of threads. Not only do debuggers introduce synchronization problems, but the programmer may be specifically interested in evaluating specific synchronization situations that could possibly occur. However, the programmer does not want to alter the design of the program to force a specific synchronization situation, but rather wants a temporary synchronization debugging method. For example, the programmer may want to evaluate how the computer program would behave when three threads of interest are utilizing sections of code simultaneously.
Another reason why thread synchronization during debugging may be necessary would be when an incomplete computer program is being debugged. Sections or modules of a computer program may merely be represented with xe2x80x9cstubsxe2x80x9d to exercise the interfaces. However, such stubs would not perform like the fall computer program. For example, a tactical fighter aircraft mission avionics system integrates a large number of displays, navigational aids, communication systems, weapons delivery systems, crew comfort, and other systems. The amount of code required to control these many functions is large, so the software development approach is generally a top-down design with bottom up testing. Thus, each subfunction required to perform these tasks is defined, including the interfaces to other subfunctions. A programmer would then write the lower tier sections of computer program, such as a module. As these lower tier sections are completed, larger amounts of the computer program are debugged and tested. It would be impractical to wait for the entire computer program to be available in order to debug. Often, some functions may be delayed for years after a flyable computer program is required.
Unfortunately, replicating synchronization situations expected in the fall computer program may be difficult to replicate in the incomplete computer program. For example, a stub routine called by the computer program may return a constant value whereas the real routine, when available, would perform a number of calculations and processing steps that would delay such a response. The programmer would like to synchronize execution of a thread executing this stub routine to be more like the expected performance, but does not want to waste time developing elaborate code for the stub that would ultimately be discarded.
One manner of debugging synchronization problems is for a user to set a break point and allow one thread to hit it. The user then suspends the first thread and commands the program to continue executing until a second thread hits the break point. The user then releases the first thread allowing both threads to execute. However, even having manually created the synchronized thread condition of interest, the suspected fault may not manifest itself until many repetitions. Moreover, the number or identity of threads suspected of causing the fault could be numerous, making this manual synchronization impractical.
Therefore, a significant need exists for a controlling the synchronization of threads so that multi-threaded applications can be debugged more readily.
The invention addresses these and other problems associated with the prior art by providing an apparatus, program product, and method of debugging a multi-threaded computer program that utilize synchronization control points to synchronize thread execution. During execution of a portion of a multi-threaded computer program, a thread is conditionally suspended at a synchronization control point (xe2x80x9csync pointxe2x80x9d) until a synchronization condition is satisfied, such as another thread or threads hitting the same or a different synchronization control point. Moreover, a thread may be further conditionally suspended for a delay period in response to the thread reaching a synchronization control point.
Control points are advantageously used during debugging of the computer program in order to temporarily restrict execution without altering the design of the computer program. Control points are included for reasons other than synchronization, especially break points which typically halt all threads either unconditionally or depending on the state of a variable. Utilizing control points for synchronization affords similar advantages in that the computer program is only temporarily altered and readily restored to its former condition. Moreover, existing hardware typically available in a computer system and software in the debugging environment may be utilized to handle synchronization control points.
Synchronization control points consistent with the invention may also be used to synchronize a number of threads without specifying which threads. The intent may be to burden certain resources of the computer program to detect timing faults, thus the number of threads is forced to execute from certain synchronization control points simultaneously. Each thread that hits one or more associated synchronization control points is held until the requisite number are being held. After which, all are released either immediately or with a predetermined delay scheduled for each thread.
One advantage of the use of thread synchronization control points is that a program being debugged can be tested bottom up, without waiting for all of the code to be available. Normal operation can be emulated with proper synchronization conditions and delays added during debugging. Another advantage is that reproducing software bugs more frequently is made easier. Yet another advantage is that thread synchronization control points could be used to detect infrequent synchronization faults that might not be apparent each time that a condition is created, e.g., an error that occurs every hundredth time that three threads simultaneously execute within a region.
These and other advantages and features, which characterize the invention, are set forth in the claims annexed hereto and forming a further part hereof. However, for a better understanding of the invention, and of the advantages and objectives attained through its use, reference should be made to the drawings, and to the accompanying descriptive matter, in which there are described various embodiments of the invention.