1. Field of the Invention
This invention generally relates to a static datarace detection method and apparatus for multithreaded applications, and more specifically to a static datarace detection method and apparatus for detecting definite dataraces and potential dataraces in multithreaded object-oriented applications written in languages such as Java™.
2. Description of the Related Art
Modern operating system platforms support concurrent multiple threads of execution. However, some of these operating system platforms support concurrent multiple threads without complete scheduling control by the user. For example, operating system platforms that support the Java™ (hereinafter “Java”) Virtual Machine Specification fall into this category. In such systems, each time a Java application runs, the number of instructions executed by a thread in a time interval allocated to the thread may change due to variations in the state of an external operating system. The apparently random number of instructions executed by threads in a time interval introduces non-deterministic program behavior. On a computer system with multiple processors (hereinafter “multiprocessor system”), interactions of different processors through the underlying hardware system cause unequal execution speed of threads running on different processors, which also causes non-deterministic program behavior. Other events such as windowing events, network events/messages and general input/output operations may also introduce non-deterministic program behavior.
One problem caused by non-deterministic behavior is a datarace. A datarace may occur when accesses of a shared resource by multiple threads may cause one or more threads to compute incorrect results. Typically, a datarace occurs when (a) multiple threads access the same memory location without ordering constraints among the accesses, and (b) at least one of the accesses is a write access. (Under certain circumstances, condition (b) may not be necessary for exhibiting a data race.)
More specifically, there are four necessary conditions for a datarace to exist between two statements: 1) the statements are executed by different threads; 2) the statements access the same memory location, i.e., the same field in the same object; 3) the synchronization objects used in controlling the execution of the first statement do not overlap with those of the second statement; and 4) there is no explicit ordering between the two statements imposed by explicit language constructs in the program such as thread creation and termination. A datarace may occur between execution instances of two statements or between different execution instances of the same statement by different threads.
By way of specific example, FIG. 6 depicts a timing diagram of two methods, named method foo( ) and method run( ). (Only the details of the methods relevant to the datarace example are shown in FIG. 6.) In the example, thread T1 executes method foo( ) (in the left column). When T1 executes statement S13, thread T2 begins executing method run( ) (in the right column). Although statement S15 in method foo( ) is inside a synchronized block starting at S14, statement S21 in method run( ) is not inside any synchronized block (or method). Therefore, there are no constraints on the order in which statements S15 and S21 can execute. Accordingly, whether S15 or S21 executes first may vary from run to run. However, the program can exhibit different behaviors depending on the order in which S15 and S21 execute. For example, when S21 executes after S15, the value that it reads from p.f is the value written by statement S15. Otherwise, it is the earlier value of p.f. Because thread-switching behavior of a program is non-deterministic, so is the manifestation of dataraces at run-time. Also, as dataraces may happen only infrequently, they may be particularly difficult to identify and may not even be noticed during testing and evaluation of a multithreaded application.
Although manual inspection of instructions is a possible approach for identifying data races, it is a time-consuming and error-prone approach. For example, a test program mtrt from the SPECjvm98 benchmark suite contains approximately 8,000 bytecode instructions. This program includes approximately 30 million distinct instruction pairs, but less than 600 of them are potential dataraces. Previous systems for identifying potential dataraces have relied upon either dynamic datarace detection or static type-analysis. Specifically, as described in “Eraser: A Dynamic Data Race Detector For Multithreaded Programs,” by Stefan Savage, et al., ACM Transactions on Computer Systems, Vol. 15, No. 4, November 1997, Pages 391–411, a tool named Eraser has been developed to dynamically determine the existence of dataraces.
However, dynamic determination of dataraces is resource intensive. With complex programs, the amount of data that a dynamic datarace detection tool must monitor can lead to excessive space and/or time consumption. For example, to perform an exhaustive dynamic datarace detection, every resource must be “instrumented,” or watched, to determine whether a datarace is actually occurring during the entire course of an application's execution. Reducing the number of resources monitored during dynamic datarace detection, can reduce the set of dataraces detected.
Further, dynamic datarace analysis detects dataraces that actually occur during an execution of a multithreaded application. The result of dynamic datarace analysis is susceptible to the instrumentation perturbation and also to the variation of input data. Accordingly, conventional systems using dynamic datarace detection fail to enable comprehensive and resource efficient identification of dataraces in complex, multithreaded applications.
Pre-existing static datarace detection methods, such as those based on static type analysis, are quite inaccurate. Specifically, as described in “Type-based race detection for Java,” by C. Flanagan and S. N. Freund, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI) June 2000, pages 219–232, a tool has been developed to statically determine the existence of dataraces based on static type analysis. A fundamental problem with this tool is that it identifies many “false positives” i.e., identifies dataraces that may never be exhibited in any program execution. Further, programmer annotations are required to improve the effectiveness of the tool.