1. Technical Field
The present invention relates to computer system verification and more particularly to systems and methods for debugging multi-threaded software.
2. Description of the Related Art
The widespread use of concurrent software in modern day computing systems necessitates the development of effective debugging methodologies for multi-threaded software. Multi-threaded programs, however, are behaviorally complex involving subtle interactions between threads which makes them hard to analyze manually. This motivates the use of automated formal methods to reason about such systems. Particularly notorious to catch are errors arising out of data race violations.
A data race occurs when two different threads in a given program can simultaneously access a shared variable, with at least one of the accesses being a write operation. Checking for data races is often a critical first step in the debugging of concurrent programs. Indeed, the presence of data races in a program typically renders its behavior non-deterministic thereby making it difficult to reason about it for more complex and interesting properties.
The classical approach to data race detection involves three steps. The first and most critical step is the automatic discovery of shared variables, i.e., variables which can be accessed by two or more threads. Control locations where these shared variables are read or written determine potential locations for data races to arise. In fact, locking related data races arise if a common shared variable is accessed at simultaneously reachable program locations in two different threads where disjoint sets of locks are held. Since locks are typically accessed via pointers, in order to determine these locksets at program locations of interest, in the second step, a must-pointer alias analysis is carried out. Finally, the main drawback of static analysis is that a large number of bogus data race warnings can often be generated which do not correspond to true bugs. The last step, therefore, is to use warning reduction and ranking techniques in order to either filter out bogus warnings or use ranking to prioritize them based on the degree of confidence.
The challenge lies in carrying out race detection while satisfying the conflicting goals of scalability and accuracy both of which depend on various factors. Key among these factors are (i) accuracy of shared variable discovery, and (ii) accuracy and scalability of the alias analyses for determining shared variables (must aliases) and locksets (may aliases). Incorrectly labeling a variable as shared renders all warnings generated for it bogus. On the other hand, if reporting a variable as shared is missed then a failure to generate warnings for a genuine data race involving this variable results.
Considerable research has been devoted to automatic shared variable discovery. However, most existing techniques are based on the underlying assumption that when accessing shared variables concurrent programs almost always follow a locking discipline by associating with each shared v variable with a lock lv, which needs to be acquired before any access to v.
Existing techniques focus on computing this association between locks and variables. Towards that end, various correlation based techniques have been developed—both statistical and constraint based. An advantage of statistical techniques is that they are scalable and do not depend on an alias analysis which can often be a bottleneck. However, the failure of correlation based techniques to detect the shared variable responsible for data races in, e.g., a suite of Linux drivers exposed the fact that their main weakness turns out to be this very reliance on the existence of a locking discipline.
Indeed, many data races arise precisely when the locking discipline is violated. Furthermore, it turns out that in most of the drivers that were considered, the original implementations correctly followed lock discipline. Data race bugs were introduced only when the programs were later modified by adding new code either for optimization purposes or in order to fix bugs. Typically, this newly added code was a “hack” that introduced lock-free accesses to shared variables that were not present in the original code. Since the only occurrences of these variables were in regions unguarded by locks, no meaningful correlations could be developed for them and was a key reason why correlation-based techniques did not work.
Race detection is a well studied problem and various techniques have been employed to attack the problem. 1) Run time data race detection: the key idea is to explore concrete executions of the given program is a systematic manner to guide one to the data race present in the code. However, since the state space of a typically concurrent program is large and, in principle, even infinite, it is hard to get good coverage and provide guarantees. 2) Model Checking: explores the entire state space of the given concurrent program. There is little hope of scaling this to handle large-scale real-life programs. 3) Static Analysis: Explores the control flow graph of a given program to extract lockset information. Advantages include scalability to large code bases. Disadvantages include too many bogus warnings.