Field of the Invention
The invention relates generally to multi-thread computing, and more specifically, to a system and method for managing divergent threads in a single-instruction, multiple-data (“SIMD”) architecture.
Description of the Related Art
Single instruction, multiple data (SIMD) is a parallel execution model adopted by some modern processors such as graphics processing units (GPUs), digital signal processors (DSPs), and central processing units (CPU). Such a processor can execute a single instruction through multiple threads concurrently by utilizing its parallel data paths. Single-program multiple-data (SPMD) accelerator languages such as CUDA® and OpenCL® have been developed to enhance the computing performance of processors that have the SIMD architecture.
Processors with SIMD architectures are designed to maximize the amount of parallel processing in the pipeline. In a SIMD architecture, the various threads attempt to execute program instructions synchronously as often as possible to increase computing efficiency. That is, it is desired that all threads follow a single flow of control for increasing computing efficiency.
A problem that decreases computing efficiency typically arises, however, when the program includes branches, and some threads want to execute the branch, but others do not. For example, to handle an if-else block where various threads of a processor follow different control-flow paths, the threads that follow the “else” path are disabled (waiting) when the threads that follow the “if” path are executed, and vice versa. Hence, one control-flow path is executed at a time, even though the execution is useless for some of the threads.
In some prior-art systems, all threads are dragged through each branch, regardless of whether the threads execute the instructions associated with that branch. Other prior-art systems simply disable all threads that do not execute a branch. Both designs are inefficient since hundreds of threads may be disabled while the branch is executed. A common multithreaded architecture is to allow threads to be broken into several thread groups. When a branch in a program is encountered, each thread group is able to traverse the branch independently of the other thread groups. Thus, the thread groups that do not execute a branch do not have to be disabled while the branch is being executed.
Yet, it is common for threads in a thread group to “diverge” from one another so that one or more threads may execute a branch, while others do not. Such circumstances may still be harmful to computing efficiency. Accordingly, it is desirable to devise an approach for managing thread divergences that may occur when a thread group encounters one or more branches in a program.