Data Races
Concurrent programs, also known as multithreaded programs, are found in a wide array of products and services, from software device management to distributed scientific computing. However, the fundamental nature of these programs, the fact that they contain multiple concurrently-executing threads, can cause inter-thread conflicts which can create errors or hanging conditions upon execution. These errors can be particularly difficult to discover when programming because oftentimes more than one asynchronously-running thread is run on a single processor. The instructions of the threads are interleaved, giving rise to a potentially large number of different executions. Because of this, an important, and difficult, part of the debugging and analysis of a concurrent program involves finding potential conflicts between threads.
One of these conflicts is known as a data race. Generally, a data race is a condition where there exists an execution of two or more threads such that the executing computer can arrive at a state for which a) there are two threads which can execute, b) both of these threads access a common variable, and c) at least one of the accesses is a write access.
FIGS. 1a-1d illustrate two types of data race conditions which can lead to unpredictable results. Avoiding these unpredictable results is the goal of the program analysis discussed below. FIGS. 1a and 1b illustrate one type of data race, that of conflicting read and write instructions from two different threads. In both Figures, there are two concurrently-executing threads which access a common variable, referred to here as “a,” which starts with value 0. The Figures illustrate two different executions of the instructions of Threads 1 and 2. A data race occurs in this example when a computer executing these threads reaches a state at which either of the two executions illustrated could execute. Other than the differing orders, described below, the variable accesses in the Figures are the same.
In FIG. 1a, Thread 1, which contains the assignment instruction “q=a,” reads the value of a as 0 and then assigns that value to the variable q. After this point in time, Thread 2 then executes the instruction “a=1” which assigns the value 1 to a. Thus, at the end of the execution of FIG. 1a, a has the value 1 and q has the value 0. In contrast, FIG. 1b illustrates a different execution in which Thread 2 writes to variable a before Thread 1 reads from it. In this case, because a is assigned a value by Thread 2 before Thread 1 is able to read a, q ends up with the value 1. Thus, the two executions illustrated in FIGS. 1a and 1b give two different results for q.
FIGS. 1c and 1d illustrate another type of data race, that of conflicting write instructions. As in FIGS. 1a and 1b, FIGS. 1c and 1d illustrate different executions of instructions from two concurrently-executing threads. In FIG. 1c, Thread 1 executes the instruction “a=0” before Thread 2 executes “a=1,” which results in a having the final value of 1. In contrast, FIG. 1d illustrates the two write commands executing in a differing order, giving a a final value of 0.
The illustrated examples of FIGS. 1a-d demonstrate that executions of concurrently-executing threads can cause different values to be placed in certain variables, which can cause a program to behave unpredictably or to fail to execute. Oftentimes, these errors are solved by forcing the competing threads to execute synchronously, which means forcing the threads to operate under a common timing or locking mechanism. The use of synchronous threads allows a programmer to decide ahead of time that certain instructions cannot interfere with each other and to make allowances for that by modifying the programming. However, in order to utilize synchronicity, data races such as those illustrated in FIGS. 1a-1d must be located.
Locating Data Races
Because data races are so timing-dependent, and may occur under only certain precise conditions, searching for them in a program can be a difficult, time-consuming process. Some existing systems for data race detection, such as model checking, attempt to statically explore every possible execution of a concurrent program by considering every possible thread interleaving. Because this analysis is done statically, it can be done at compile time without requiring execution of the analyzed program. While these systems are sound, that is, they find every possible error, they may report false errors by identifying data races from interleavings of instructions that cannot or will not happen. By contrast, some existing systems analyze concurrent programs dynamically by executing the program and observing its operation. These dynamic systems cannot guarantee to locate every data race, however, and may report false errors.
While traditional static data race analysis is more sound than dynamic analysis, it suffers from a number of disadvantages. Traditional static analysis can require the addition of programmer annotations, which increases debugging time. Additionally, because the execution time of such an analysis grows exponentially with the number of threads in the concurrent program, the time required to perform such an analysis can be prohibitively expensive. In certain circumstances, such an analysis may never complete; it has been proven that the general problem of detecting data races in multithreaded programs is undecideable. That is, no program can exist which can correctly identify every data race in every concurrent program in a finite period of time.
In contrast, analysis on single-threaded, or sequential, programs has been shown to be decideable. Thus a number of existing products have been developed and optimized to perform static analysis on sequential programs. One such tool is the SLAM system, discussed in “Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages” (ACM Press 2002). Tools such as SLAM are widely available, useful to programmers, and have been tested and optimized to provide efficient analysis. As an example, many of these systems are optimized to efficiently check on single variables and to ignore accesses in an analyzed program that are unrelated to a target variable. While these optimized tools would be useful for data race checking, they have traditionally not been helpful to programmers of concurrent systems because of the undecidability and time-cost of analyzing concurrent programs. What is needed is a system that would allow developers of concurrent programs to take advantage of the efficiency of existing sequential analysis tools when searching for data races.