A data race is a type of problem that may occur in multi-threaded programs or multiple programs accessing the same data which may lead to anomalous behavior of the program(s). Data races may occur where a shared variable can be accessed by various threads/programs simultaneously. Threads/programs “race” to access a shared variable and, depending upon which access occurs first, program results may vary unpredictably. Conventional solutions to this problem attempt to detect data races before they occur. This is partially due to the fact that data races are unpredictable and thus extremely difficult to reproduce during the debugging process. Indeed, any anomalous behavior caused by a data race is dependent on the precise timing of separate threads/programs accessing the same memory location and may thus disappear if that timing is altered during the debugging process.
Conventional solutions for data race detection monitor lock acquisition and memory accesses, computing an access pattern for each memory location and memory access. These solutions then evaluate the access pattern to memory locations to detect suspicious access patterns that may indicate a potential data race. An access pattern is “suspicious” if a memory location is shared among multiple threads without a common lock that may be used by individual threads/programs to govern access to the memory locations. Locks may be used to prevent data races from occurring where suspicious activity is detected.
A lock is a software construct that enables at most one thread/program to access a shared variable at a certain point in time. A locking discipline (i.e., a way of using of a lock) may require that a lock for a shared variable must be acquired before accessing the shared variable. Once a thread/program has completed its access to the shared variable, the lock is released. Locks are “acquired and released,” enabling only one thread to access a particular shared variable at any given time. Locks and locking disciplines typically follow an access pattern.
Current methods used for detecting potential data races in a multithreaded program include running the program while monitoring lock acquisition and memory accesses, computing an access pattern for each location in memory, and on each memory access evaluating the accessed location's access pattern to determine if it is suspicious. When a memory access to a location results in the discovery of a suspicious access pattern, the stack of the offending thread is dumped so that a potential data race can be diagnosed. Because a suspicious access pattern might be a false alarm, conventional techniques continue to run the program in order to make further discoveries. However, in order not to overwhelm the user with redundant information, conventional techniques suppress all stack dumps after the first for each location, thus limiting the developer's ability to understand the race condition and how to fix it.
Based on the idea of locksets, “Eraser” (described in, for example, Savage et al., “Eraser: A Dynamic Data Race Detector For Multithreaded Programs,” 15 ACM Trans. Comp. Sys. 391-411 (1997), incorporated herein by reference) was the first implementation of a method for detecting potential data races in a multithreaded program by running the program while monitoring lock acquisition and memory accesses, computing an access pattern for each location in memory, and on each memory access evaluating the accessed location's access pattern to determine if it is suspicious. An access pattern is suspicious when it indicates that (a) the location is shared among threads, (b) there is no common lock held by all accesses, and (c) at least one of the accesses is a write. This method has also been implemented for programs written in Java and for programs written using the Rotor CLI. The latter implementation is called “RaceTrack” and its authors include the present inventors.
The Eraser access pattern method adds one state word for each memory location that is potentially shared among threads. The state word encodes the state of the currently computed access pattern for its memory location. FIG. 1 illustrates a state diagram for the Eraser access pattern method. Each location starts out in “virgin” state 102, then moves to an “exclusive” state 103 when a thread first accesses the location, then to a “shared” state 116 when additional threads access the location. The “shared” state 116 is subdivided into “shared read” 108 and “shared modify” 110 depending on whether all shared accesses are reads or if any are writes.
In the “exclusive” state 103, the access pattern identifies the thread that is exercising exclusive access, in order to detect when a different thread accesses the location and thus changes its state to “shared” 116. In the “shared” state 116, the access pattern identifies the set of locks that all shared accesses have held in common. Because a set of locks could potentially be a large amount of information to describe, what Eraser actually stores in the state word is an index into a lockset table of an entry that describes the set of locks. Because the number of different locksets used is far fewer than the number of different locations accessed, the use of an index into a table may be a good storage compression technique.
The set of locks identified by any particular “shared” state access pattern can only shrink over time, as further accesses occur. A “shared modify” access pattern with an empty lockset is suspicious. When an access causes an access pattern to first become suspicious, the stack of the offending access is dumped, and the location state is changed to a “warning” state 112. Once the location state is in the “warning” state 112, no further stack dumps are given. Although logically the “warning” state 112 is a separate state, Eraser and RaceTrack actually interpret a “shared modify” state with an empty lockset as the “warning” state instead of representing it explicitly.
FIG. 2 illustrates a state diagram which is used by RaceTrack as well as other prior art. FIG. 2 is similar to FIG. 1, except that the “exclusive” state 114 is subdivided into exclusive access by a first thread 104 and then exclusive access by a second thread 106. This modification is needed in order to prevent generating a false alarm for a common multi-threaded, object-oriented programming paradigm in which the first thread initializes an object and then hands it over to a second, newly-created thread with no sharing intended. Only when the assumption of exclusive access by the second thread is proven wrong is the state changed to “shared”.
In each state except “virgin”, some additional information must be stored: in an “exclusive” state it is the identity of the thread exercising exclusive access, in a “shared” state it is the set of common locks. In order to store the access pattern in one word, a few bits are used to encode the state and the remaining bits are used to store a thread identifier or a lockset index. FIG. 3 shows an example of how the RaceTrack access pattern states can be encoded into a 32-bit word.
More particularly, FIG. 2 illustrates a conventional access pattern state diagram. Here, a series of states 102-112 and “superstates” 114-116 are described to illustrate conventional techniques for detecting potential data races. “Exclusive” describes those states where only one thread/program may access a variable at any given time. “Shared” refers to variables that may be accessed simultaneously by multiple threads/programs, unless one of the threads/programs is performing a write operation, which indicates a suspicious pattern (i.e., a potential data race). States 102-112 represent a particular state of an item during an access. Each item is initially in a “virgin” state 102, then moves to an exclusive first state 104 when a thread in a multi-threaded program (or a program) first accesses the item. When a second thread/program accesses the item (previously accessed by the first thread/program), the item moves to an exclusive second state 106. The separation of exclusive superstate 114 into an exclusive first state 104 and an exclusive second state 106 prevents generation of a false alarm. If a program is designed to allow a first thread/program to initialize an object, handing it over to a second thread/program without ever performing any simultaneous shared access, a false alarm indicating a potential data race may be generated.
When a different thread accesses an item in exclusive second state 106, the item moves to shared superstate 116. If the access is a read operation (“read”), then the item enters shared read state 108. In the event that the access is a write operation (“write”), the item enters shared modify state 110. This is an example of a “first shared” access. Subsequent accesses are also referred to as “shared” accesses. Also, if the shared access is a write and the item is in shared read state 108, the item moves to shared modify state 110. Entering a shared state (e.g., shared read state 108 or shared modify state 110) also initiates computation of a set of locks (“lockset”) that are common to shared accesses to an item. The first lockset is set to the set of locks held by the accessing thread when the first shared access occurs. On every subsequent shared access, the item's lockset is reduced to the intersection of its lockset and the set of locks held by the accessing thread.
An access pattern's lockset can only decrease over time, as subsequent accesses occur. However, a shared modify access pattern with an empty lockset indicates a suspicious pattern. When a suspicious access pattern is first detected, conventional implementations generate a warning (e.g., warning state 112) of a potential data race. Typically, when a warning of a potential data race is generated, the stack of the thread associated with the suspicious pattern is dumped, enabling a user to diagnose a copy of the thread whether a potential data race exists while still permitting the program to run. A “warning” state 112 is entered if suspicious patterns are detected.
FIG. 3 illustrates conventional encoding of access patterns. As an example, conventional techniques encode information relevant to access patterns using 32-bit words that include state information. In each state, except virgin state 102, information in addition to the state name must be stored. In an exclusive state (e.g., exclusive states 104-106), an identifier for a thread exercising exclusive access is stored. In a shared state (e.g., shared states 108-110) a set of common locks is stored. In order to store an access pattern in one word, typically a few bits (e.g., bits 202-210) are used to encode the state name. Fields 212-220 are used to store remaining bits for a thread identifier or an index in a table of locksets.
Having a stack dump of one access is often sufficient to draw attention to the relevant source code, for which a careful examination can reveal whether the suspicious access pattern represents a true race or just a false alarm. However, it would be useful to have stack dumps for other accesses to such a location, provided that the additional stack dumps were selected so as to likely contain significant additional information about the causes of the suspicious access pattern. The problem is how to select which other stack dumps to give. If the selection is too liberal, too many stack dumps will be given containing little additional information and the result will not be useful. If the selection is too conservative—as in the prior art, which gives no additional stack dumps at all—no additional information is revealed. It would be further desirable that this selection is on-line as the program runs.
Thus, what are needed are systems and methods that overcome the limitations and drawbacks of conventional techniques.