In a multi-threaded environment, race conditions related to shared memory access can result in incorrect values being computed or even in incorrect program execution. A data access hazard occurs when two or more accesses (e.g., read and/or write) to the same location in memory may occur without any guarantee of ordering between the accesses. When one ordering of thread accesses to the memory location may provide a first result, whereas a different ordering of thread accesses may provide a different, second result, this is referred to as a data race condition.
In the case of multi-threaded processing environments, the large number of simultaneous executing threads will increase the possibility of creating such race conditions or errors. That is, a processor system may include an operating system that controls hardware resources that access a common memory location when executing a program. For instance, a general purpose GPU (GPGPU) programming environment may include thousands of GPGPUs, each running tens of thousands of threads, processing the same code in order to reach a result, such as, rendering a graphical image. These large numbers of threads are susceptible to race conditions that may be propagated throughout the computation, especially if all the GPGPUs are executing identical code.
Traditional race detection schemes rely on static analysis using symbolic evaluation of all possible execution paths to perform detection of potential hazards. However, not all such execution paths can be taken when the program is actually executed. Another approach is via simulation of programs. In such schemes, the processing unit is simulated in a software environment, and the program is executed in the simulation environment. However, both static analysis and simulation based approaches for race detection are not well suited to handle cases where thousands of threads could potentially be executing simultaneously. Additionally, since the simulated environment is not hardware based, it may not give a true analysis of race conditions when executing the program on the actual hardware.
Further, a common problem for tools that report data access hazards includes the high rate of false positives (i.e., false reports of data access hazards that cause races). This occurs when information about the hazard of interest to the user is hidden among other hazard reports. This is of an increasing concern when a large number of concurrent threads are executing a program.