The present invention relates generally to a system and method for statically detecting potential race conditions in multi-threaded computer programs, and more particularly to building and analyzing a synchronization graphxe2x80x94the synchronization graph representing certain computer program elements and execution pathsxe2x80x94to detect potential race conditions on object data fields.
Most commercial operating systems, such as Microsoft Windows 95 and modern programming languages, such as C++ and Java, support the use of threads. Many popular software applications, such as Microsoft Word and Netscape Navigator, are multi-threaded. In a multi-threaded environment, a program may consist of one or more threads of control, each of which shares a common address space and most other program resources. Multi-threading is often used to exploit internal software application and hardware parallelism, resulting in among other things, improved networking performance, and the speeding up of user feedback response.
Programming with multiple threads introduces the potential for a timing dependent error known as a race condition. In order for a race condition to occur, the following three conditions must be present: (a) two or more threads executing in parallel (hereinafter referred to as xe2x80x9cparallel threadsxe2x80x9d) access a same memory location at nearly the same time; (b) at least one of the threads modifies the data in that memory location; and (c) the threads use no explicit mechanism, or lock, to prevent the accesses from being simultaneous. This type of unsynchronized access to a memory location can produce unintended results. To illustrate the idea of a race condition, consider the program illustrated in Table 1, where two unsynchronized threads access and change the value of a shared resource. The program illustrated in Table 1 is written in the Java programming language, but the problem illustrated is applicable to all multithreaded programming environments.
This program has a potential race condition because access to the shared data field xe2x80x9cd.fxe2x80x9d in both methods, xe2x80x9cmain( )xe2x80x9d and xe2x80x9crun( ),xe2x80x9d is not protected by any lock. At runtime, the increment operation (d.f++) and the decrement operation (d.fxe2x88x92xe2x88x92) may therefore be interleaved in an arbitrary manner, producing unpredictable results. If these two operations do not overlap, then the resulting value of d.f will correctly be zero. If the operations do overlap, then the result could also be either of xe2x88x921 or +1, which presumably is not what the programmer intended.
A few simple techniques are commonly used in modern programming languages to synchronize the activities of threads. Most of these techniques are based on the concept of monitors, or locks. It is not necessary to know the details of how these various techniques work to practice the present invention. However, a brief explanation of how locks work is provided as background information.
A lock is typically associated with a resource that multiple threads may need to access, but that should be accessed by only one thread at a time. If the resource is not being used, a thread can acquire its lock and access the resource. However, if another thread already has the lock to the resource, all other threads have to wait until the current thread finishes and releases the lock. Then another thread can acquire the lock and access the resource.
The unsynchronized access to a shared resource illustrated by the program in Table 1 can be fixed by coordinating the activities of the threads, so that they do not collide in the same address space. The program in Table 2 illustrates parallel executing threads synchronizing access to a shared resource.
This program uses a locking mechanism (identified by the xe2x80x9csynchronizexe2x80x9d statement) to synchronize the two parallel executing threads. Thus, the increment operation (d.f++) and the decrement operation (d.fxe2x88x92xe2x88x92) cannot overlap and the value of d.f will correctly be zero.
Even with the availability of locks, or monitors, it is easy to introduce race conditions into a computer program. A computer programmer, for example, may inadvertently overlook the need for a synchronization instruction, accidentally leave a synchronization instruction out, or implement the instructions in the wrong order.
In addition to race conditions being easy to introduce into a computer program, they are also generally very difficult to find. For one, a data race typically becomes apparent only if two threads access an improperly protected memory location at nearly the same time. A program could potentially run for a long time without showing any signs of a problem. Also, since threads may be time-sliced, which means they can run in arbitrary bursts as directed by the operating system, the symptoms may be different each time a race condition actually occurs.
Race detection schemes have been studied for decades. What has resulted are a number of commonly used techniques used for detecting potential race conditions. These techniques can be categorized as either dynamic or static. In general each technique has problems. For instance, since dynamic detection schemes operate while the program is executing, they can significantly slow down an application due to the overhead incurred by making a procedure call at every load and store instruction. (See Savage et al., xe2x80x9cEraser: A dynamic Data Race Detector for Multi-threaded Programs,xe2x80x9d 1997, page 32, first para.). Also, since dynamic detection schemes utilize testing methodologies, they may fail to detect certain race conditions because of insufficient test coverage.
An example of a static race detection scheme is disclosed by WARLOCK (See Sterling, xe2x80x9cWARLOCK A Static Data Race Analysis Tool,xe2x80x9d SunSoft, Inc., 1993). WARLOCK works xe2x80x9cby tracing the execution of every path through the code.xe2x80x9d (See page 3, col. 2, para. 2). WARLOCK traces each execution path by analyzing a file output as the result of compiling the computer program. One problem with path tracing race detection algorithms is that static race detection systems such as that of WARLOCK do not support the use of dynamically dispatched method calls and will therefore not detect potential race conditions in source code written with an object-oriented language, such as C++ or Java. Secondly, in a worst-case scenario, the execution time of the algorithm increases exponentially with the size of the program being analyzed. As if these problems were not enough, systems such as WARLOCK do not infer information about which fields may be shared between multiple threads (a race condition can only occur on object data fields shared by multiple threads). The absence of this information can result in spurious, or false alarms indicating potential race conditions on unshared, or thread-local, object data fields. Often, in order to avoid such false alarms, a programmer must annotate the source code of the computer program with declarations about which data fields are not shared.
Other race detection schemes are described in Young and Taylor (xe2x80x9cCombining Static Concurrence Analysis with Symbolic Execution,xe2x80x9d IEEE Transactions on Software Engineering, Vol. 14, No. 10, October 1989), Appelbe and McDowell (xe2x80x9cIntegrated Tools for Debugging and Developing Multitasking Programs,xe2x80x9d Ga. Institute of Tech. and Univ. of Santa Cruz, 1988), Callahan and Subhlok (xe2x80x9cStatic Analysis of Low-level Synchronization,xe2x80x9d Rice University, 1988), Emrath and Padua (xe2x80x9cAutomatic Detection of Nondeterminacy in Parallel Programs,xe2x80x9d Univ. of Il., 1988), and Cheng et. al. (xe2x80x9cDetecting Data Races in Cilk Programs that Use Locks,xe2x80x9d Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, 1998).
Model checking of concurrent programs for potential race condition detection is well known. See, for example, Chamillard et al (xe2x80x9cAn Empirical Comparison of Static Concurrence Analysis Techniques,xe2x80x9d 1996), Corbett (xe2x80x9cEvaluating Deadlock Detection Methods for Concurrent Software,xe2x80x9d IEEE Transactions on Software Engineering, Vol. 22, No. 3, 1996), and Fajstrup et. al. (xe2x80x9cDetecting Deadlocks in Concurrent Systems,xe2x80x9d Aalborg University, 1997). Although this approach has proven useful on finite state systems, it cannot be directly applied to non-finite state systems, including those that dynamically allocate data and objects through the use of programming languages such as C++ or Java.
In light of the above, it would be beneficial to have a system and methodology for detecting potential race conditions that: (a) works with programming languages that dynamically allocate objects and data; (b) reduces false alarms by inferring information about which data fields may be shared between multiple threads; (c) detects potential race conditions independent of test coverage; and (d) does not slow down program execution during a debugging process.
In summary, the present invention is a system and method for statically detecting potential race conditions in multi-threaded computer programs. The computer program is typically in an object oriented programming language having at least one class, and each class normally has at least one method or at least one data field.
The method begins by generating a synchronization graph representing method declarations, object field declarations and synchronization statements in a specific program. Each method declaration, object field declaration and synchronization statement in the specific program that is represented by the synchronization graph is represented by a respective node in the synchronization graph. Each node contains synchronization information indicating the locks acquired when the body of code corresponding to that node is entered. Edges between the nodes represent execution paths of the program and program accesses to the object data fields in the computer program.
Next, the synchronization graph is traversed and a synchronization value is generated for each node in the graph. Each node""s synchronization value represents a union of first and second values, the first value corresponding to the synchronization information stored for the node, and the second value corresponding to an intersection of all locks applicable to each other node in the graph for which there is an edge pointing to the node from the other node.
The method reports at least a subset of the nodes representing object field declarations whose synchronization value is a predefined null value.