1. Field of Invention
This invention relates generally to concurrent program analysis and more specifically to tools for detecting races such as data races in multithreaded programming.
2. Background of the Invention
Concurrent programming involves multiple concurrent flows of control to perform multiple tasks in parallel. Each time a task is invoked, a thread is used to implement the particular task. A xe2x80x9cthreadxe2x80x9d is a single sequential flow of control that can be abstractly thought of as an independent program control block. Each thread can execute its instructions independently, allowing a multithreaded program to perform numerous tasks concurrently. Having multiple threads in a program means that multiple threads can be executing in parallel, where each thread is running its own execution path. Namely, at any instance the multithreaded program has multiple points of execution, one in each of its threads. Additionally, a multithreaded program creates a new thread by some mechanism, such as calling xe2x80x9cForkxe2x80x9d with a given procedure and arguments. The thread starts execution by invoking the procedure with the given arguments. The thread terminates when the procedure returns.
As a simple example, consider the following program which executes the functions ƒ1(x), ƒ2(x), and ƒ3(x) in parallel. In this example, a thread is created by calling xe2x80x9cForkxe2x80x9d, and giving it a function and arguments.
In multithreaded programs a shared addressable resource is one that can be accessed by multiple threads (i.e., shared across multiple threads). Typically, large multithreaded programs have sections of code that operate on one or more program elements corresponding to shared addressable resources (hence, they are shared program elements). Global variables are an example of shared program elements. Addressable resources (or simply xe2x80x9cresourcesxe2x80x9d) include, for example, one or more of input/output (I/O) ports and address spaces in memory that are allocated for data objects such as arrays, records, fields, structures, classes, or objects. Synchronization mechanisms are needed in order to permit the threads to read from or write to the shared addressable resources without adversely impacting each other""s operations. What is more, threads interact with each other through access to the shared addressable resource (e.g., global variables).
Large multithreaded programs also have sections of code that operate on addressable resources that are not shared across multiple threads. Addressable resources that are used in this manner, i.e., addressable resources that have a thread-local property (e.g., used only by a single thread), do not require special means of access protection or exclusion. Multithreaded program analysis depends on the thread-local and thread-shared properties of addressable resources. To facilitate concurrent or multithreaded program analysis, programmers may indicate addressable resources as local or shared by annotating the addressable resources. Programmers annotate addressable resources in order to specify whether a particular data type is thread-local or thread-shared.
In multithreaded programming, access of thread-shared addressable resources requires special attention since multiple threads can operate in parallel to manipulate them and, as a result, errors may arise. For example, when two or more threads in a multithreaded program manipulate an addressable resource (e.g., data structure) simultaneously, without synchronization, addressable resource race conditions occur (hereafter xe2x80x9crace conditionsxe2x80x9d; race conditions affecting access to data are often termed data races or data race conditions). Race conditions often result in non-deterministic program behavior. Race conditions can be avoided by a careful programming discipline that implements and maintains mutual exclusion, including protecting an addressable resource with a xe2x80x9clockxe2x80x9d and acquiring the lock before manipulating the resource. Namely, only the thread that xe2x80x9choldsxe2x80x9d the lock can manipulate the addressable resource. A data type known as xe2x80x9cmutexxe2x80x9d and the construct known as xe2x80x9clockxe2x80x9d are implemented to achieve mutual exclusion (and can be expressed in the form:
Var x: mutex;
Lock x Do . . . statements . . . End;).
The xe2x80x9cLockxe2x80x9d clause imposes serialization of threads"" access and manipulation of shared addressable resources. Since at each instance one and only one thread can hold the lock, careful adherence to this lock-based synchronization discipline ensures a race-free program.
To facilitate mutual exclusion, various program analysis tools, such as escape analysis for Java, can infer which addressable resources have the thread-local property. Escape analysis for Java is described in more detail in an article by Jong-Dock Choi et al., entitled xe2x80x9cEscape Analysis for Javaxe2x80x9d, 1999, ACM. Escape analysis is further described in an article by John Whaley et al. entitled xe2x80x9cCompositional Pointer and Escape Analysis for Java Programsxe2x80x9d 1999, ACM. Both articles are incorporated herein by reference. Escape analysis works by performing an analysis of the entire program. Escape analysis is not capable of verifying thread-local annotations if only part of the program is provided. Inferring thread-local properties requires substantially more computational resources than just verifying them and therefore is not as efficient.
Conventional static analysis tools, such the Warlock, are not capable of verifying that certain data locations have the thread-local property. Warlock is described in an article by Nicolas Sterling entitled xe2x80x9cWarlock, A Static Data Race Analysis Toolxe2x80x9d Usenix, Winter Technical Conference, 1993, which is incorporated herein by reference. Warlock allows a programmer to annotate an addressable resource as being local to a thread. However, Warlock does not verify whether the thread-local annotation is correct.
Another tool, Eraser, is described in an article by Stefan Savage et al. entitled xe2x80x9cEraser: A Dynamic Data Race Detector for Multi-Threaded Programs,xe2x80x9d 1997, ACM. Eraser is a dynamic analysis tool that monitors lock acquisitions and data accesses while a concurrent program executes. Eraser determines the locks that protect each addressable resource (e.g., data location) accessed by the program. Eraser allows addressable resources that are not visible to multiple threads to be without locks. As with escape analysis, the information about thread-local type resources is inferred, rather than relying on annotations.
Accordingly, there exists a need for a more effective concurrent program analysis tool that is a better predictor of potential race conditions. The present invention addresses the foregoing and related problems.
The present invention provides a method and apparatus used in concurrent program analysis for detecting races, such as data races. A feature of the invention is verifying annotations of addressable resources in a program. The present invention verifies annotations by checking if thread-local addressable resources are indeed thread-local. Alternatively stated, the present invention checks the validity of the thread-local annotations, and it further checks the validity of the thread-shared annotations.
In one embodiment of the present invention, checks are run on elements of a computer program source code that are annotated as thread local in order to determine the validity of their respective thread-local annotations. Alternatively stated, checks are run on the elements of the source code that are annotated as thread-local to verify that such elements are indeed thread-local and not in fact thread-shared. Further examination can determine if elements of the source code that are annotated as thread-shared are in fact sharable. Additional requirements are needed to ensure the validity of the thread-local annotations when one addressable resource is derived from another as when one class is derived from another. Programs that satisfy these requirements are guaranteed not to share values of thread-local type between multiple threads. Therefore, the analysis enables static tools to properly check programmer annotations indicating which data types are thread-shared and which are thread-local.
Hence, in accordance with the purpose of the invention, as embodied and broadly described herein, the invention relates to a method used in concurrent program analysis for detecting potential race conditions, such as data races, in a computer program. The computer program can spawn a plurality of threads that are capable of being executed concurrently. The method includes receiving the source code of a computer program. The source code includes an element annotated as either thread-local or thread-shared. The method also includes determining if the element is annotated as thread-shared or thread-local; and verifying the validity of the thread-local annotation if the element is annotated in the computer program as thread-local. An invalid thread-local annotation may cause a race condition. The method further includes indicating a race condition warning or error if upon verifying the validity of the thread-local annotation it is determined that the element or a portion thereof is visible from more than one, rather than one and only one, of the plurality of threads.
In further accordance with the purpose of the invention, as embodied and broadly described herein, the invention relates to an apparatus for concurrent program analysis. The apparatus includes means for receiving source code of the computer program. The source code includes an element annotated as either thread-local or thread-shared. The apparatus also includes means for type checking the source code; and means for thread-local checking located either inside or in series with the type checking means. The means for thread-local checking includes means for verifying the validity of the thread-local annotation if the element is annotated in the computer program as thread-local, wherein invalid thread-local annotation may cause a race condition such as a data race. The apparatus can further include means for parsing the source code; and means for creating from the source code an abstract syntax tree.
Accordingly, the present invention beneficially provides for a more effective concurrent program analysis. The description that follows provides a more detailed description of the invention.