1. Field of the Invention
The present invention relates in general to the field of parallel program compilation, and more particularly to a system and method for compile-time non-concurrency analysis of parallel programs.
2. Description of the Related Art
Shared memory parallel programming provides a powerful tool for performing complex processing across multiple threads. For instance, the industry-defined OPENMP application program interface supports multi-platform, shared memory parallel programming in a number of sequential languages, such as Fortran and C++, across a number of different architectures, such as UNIX or WINDOWS NT platforms. OPENMP supports incremental writing of parallel programs by extending base sequential languages with a set of compiler directives, runtime library routines and environmental variables so that well-structured code blocks and well-defined semantics improve compiler analysis. However, writing correct and efficient parallel programs presents the difficulty of managing concurrent execution of parallel routines by different threads in a team. Concurrency occurs where the execution order of different threads is not enforced and thus synchronization must be used to control shared resources.
Static non-concurrency analysis serves as the base for a number of techniques used to analyze or optimize parallel programs, such as programs that use OPENMP. For instance, static non-concurrency analysis is used for race detection, dead-lock detection, unnecessary lock/barrier removal, synchronization optimization and debugger support. As a specific example, race detection detects general races, which occur when the order of two accesses to the same memory location is not enforced by synchronization. General races are classified as data races, which occur when the access to memory is not guarded by critical sections, and synchronization races. A correct OPENMP program may contain synchronization races but is generally expected to be free of data races. If any two accesses to the same memory location cannot be executed concurrently, then a general race is not possible. If two accesses can be executed concurrently and the accesses are guarded by critical sections, then a synchronization race is possible while a data race is not possible. If a race condition is possible for an OPENMP program, the behavior of the program is undeterministic.
For another example, in order to correctly compile a typical OPENMP program, users generally perform a manual scope of each variable used in parallel regions to define allowed memory accesses of a variable as shared, meaning that all threads share a single copy of the variable, or private, meaning that each thread accesses only its own copy of the variable. Other scopes for variables include firstprivate, lastprivate, reduction or threadprivate scopes. Accurate scoping of variables is tedious and error prone, typically including at least some non-concurrency analysis to ensure that data races do not exist and to otherwise optimize program execution. If a data race is possible for a variable in a parallel region, the data race generally is eliminated by serializing the associated code and scoping the variable as shared.
Determining exact concurrency in a given OPENMP program is difficult, especially with complex programs, and is practically impossible on a real-time basis during compile of a program. A variety of proposals have been made for detecting race conditions and non-determinacy in parallel programs, however, available techniques generally use low-level event variable synchronization, such as post/wait and locks. Such techniques tend to be inefficient and complex. Another proposal for detecting race conditions and non-determinacy with a compile-time non-currency analysis uses barriers that divide a parallel program into a set of phases separated by the barriers. However, the known barrier-based analysis fails to detect non-concurrency within a phase. Another alternative is run-time detection of race conditions and other synchronization anomalies, however runtime detection techniques generally have relatively large execution overhead that limits their use to small test cases.