The POSIX Standard 1003.1c defines a portable interface to threading packages for multiple operating systems. This standard, known as pthreads, is the most widely adopted binding of the threading control functions to a programming language. The adoption of pthreads by the C and C++ programming community provides a common binding for multithreaded applications to be created. Other languages, like Java, have not adopted the pthread binding for the thread control functions, but do provide a semantically similar set of control primitives. Other operating systems also diverge in the exact implementation of the threading control primitives. Microsoft, Inc., with it""s WIN32 programming API, implements Windows Threads, which are similar but not exactly like the definitions of the threads in other languages or on other platforms.
Even given the diversity of threading implementations, we can see a common semantic model coming into definition. A thread is defined to be an autonomous unit of execution control sharing a common address space with the host process. The threads are allowed to run using a portion of the CPU or CPUs on which they are executing. The exact details of the thread scheduling mechanism are usually not known in a portable manner. The users of the threading packages usually just accepts that eventually the threads will execute to completion.
In addition to the control mechanisms for creating and destroying threads, most threading implementations supply synchronization mechanisms. These mechanisms are used to communicate between the threads. The communication may be in the form of mutual exclusions (i.e. only one thread is allowed to execute a section of code), or may be in the form of signaling (i.e. one thread notifies other threads that some information is now available).
The combination of threading control and synchronization mechanisms produce a semantic environment that is sufficient to control the multithreaded algorithms but also sufficient to introduce severe programming problems when used incorrectly.
Several classes of problems arise when threads are used incorrectly. This document will focus on two main classes, these are races, and deadlock. A race is defined as simultaneous access to a shared resource or location in a manner when mutual exclusion is not defined. A deadlock is defined to be a condition in the program where a set of threads waits indefinitely to acquire a set of resources.
Two research projects are notable as prior art. These are Eraser and RecPlay. The purpose of both systems is to detect errors in threaded applications.
Eraser was developed at the University of Washington. It is based on the ATOM technology developed by Digital Equipment Corporation (DEC) for the instrumentation of ALPHA microprocessor executables. Eraser""s mode of operation is to translate an executable program into an instrumented executable program. The new program is then executed and the errors are calculated during the execution. After the execution a report is generated indicating where the program could execute incorrectly.
Eraser is based on the notion of lock coverage sets. For each memory reference in the program, Eraser records the set of locks that are held during the access. It then calculates the intersection of all of these sets over all accesses to each memory location. If the memory location is accessed by more than one thread, and the set of locks held during each access is empty, then Eraser records that a potential error exists in the program for accesses to this memory location.
The design of Eraser has several consequences. First, the algorithm used by Eraser is timing independent. The order of the memory accesses has no effect on the results of the error detection. Second, Eraser has problems with derived effects. For example, in the Bounded Buffer algorithm, locks are held when there are accesses to the buffer to get or to put an element, but no locks need to be held when accessing the fields of the element that was retrieved from the buffer. Eraser (without additional hints) incorrectly flags these accesses as potential errors. Eraser is also unable to deal with directed synchronization caused by the use of a condition variable, or a thread join operation. And finally, Eraser can not deal with the concept of a global or a local barrier, where the mode of operation in the program changes. For example, if in the first phase a variable is protected by the lock xe2x80x9cAxe2x80x9d, and in the second phase the variable is protected by the lock xe2x80x9cBxe2x80x9d, Eraser would report that the set of locks held during the accesses to the variable is empty. Eraser has the advantage that only a single execution of the program is needed to find the errors that might have occurred during that execution.
The RecPlay system was developed in the Universiteit Gent, Belgium. RecPlay is based on the notion that it is possible to record the order in which synchronization events occur during an execution of a program, and then to replay that execution by delaying the execution of the synchronization events until they occur in the same order as was recorded for the original execution. The advantage to this scheme is that recording the order of the synchronization events is an inexpensive operation and thus causes minimal perturbation to the execution of the program. The assumption is that this recording would be permanently enabled so that if an error exhibited itself it would be easy to replay the execution to determine the cause of the error. During the replay, the program is executed again in the same environment and with the same inputs as the original execution. This error detection phase instruments the executable code on the fly to replace SPARC memory references with a trap instruction so that the thread memory trace can be determined. RecPlay is based on the notion of Lamport clocks. The clocks maintain a partial ordering of the threads. When a memory access is being checked, the previous access to this memory location is compared with the current access to see if a partial ordering exists between the two threads. If a partial ordering does not exist (based on the Lamport clocks), then a potential error is reported as the accesses to this memory location are not synchronized with respect to each other.
The design of RecPlay has several consequences. First, the algorithm used by RecPlay is timing dependent. Since RecPlay uses the order of events to determine correctness, changing the order of events can change the output of the analysis. RecPlay attempts to overcome this problem by determining the order of events with minimal intrusiveness during the first recording phase where only the synchronization events are monitored. Second, RecPlay can correctly determine indirect synchronization effects. It can correctly determine that the Bounded Buffer algorithm protects the accesses to each element which is placed into the buffer. It also can correctly handle multiple phases of execution where the locks are different for the phases. The greatest weakness of RecPlay is the requirement for replayed execution. For some programs, it is trivial to restart an execution and to exactly reproduce the environment and the inputs which caused the program to generate the sequence of synchronization events that were recorded. But this is not always possible. If the program makes destructive modifications to its environment it may be very difficult to roll-back these changes to allow repeated executions to be exactly identical. Another weakness of the RecPlay system is that it requires three executions of the program to report the error messages. The first execution records the synchronization order. The second execution calculates potentially unsynchronized memory accesses, and the third execution generates the report of which threads accesses the problem memory locations in an unsynchronized manner.
By way of introduction only, the present invention provides a mechanism for detecting defects in multithreaded computer programs. Defects are classified into two categories, races and deadlocks. Races occur during execution of the program where multiple threads may modify and access a shared variable without synchronization. Deadlocks are detected by server tasks which monitor a representation of thread state and detect cycles in these graphs. A further form of deadlock is detected where a thread can not make forward progress for a predetermined period of time.
The mechanism is preferably implemented in software code for operation in conjunction with a general purpose computer. Particular applications for the mechanism include debugging programs written in Java and pthreads. The mechanism includes an annotated address trace generator, an analysis mechanism for detecting defects in the annotated trace and a report generator for communicating the defects to the user. In one embodiment, the report generator provides a graphical user interface for interactive identification and correction of detected defects.
The foregoing description of the present invention has been provided only by way of introduction. Nothing in this section should be taken as a limitation on the following claims, which define the scope of the invention.