1. Field of the Invention
The present invention generally relates to parallel computing and, more specifically, to compiler-controlled region scheduling for SIMD (single-instruction multiple data) execution of threads.
2. Description of the Related Art
A Single-Instruction-Multiple-Data (SIMD) processor is a processor that executes a set of instructions, with each instruction operating on multiple different data values simultaneously. Applications written for SIMD processors may be divided logically into “warps”, where each warp is a group of “threads” that execute cooperatively and simultaneously on a SIMD processor. Generally, each thread in a warp executes instructions on a different data value, but executes the same instruction as other threads in the warp.
Execution of threads in a warp may diverge. If program instructions dictate that one or more threads in a warp takes a first path while one or more other threads in the warp takes a second path, then the threads in the warp diverge. Thread divergence may happen for a number of reasons. For example, because of the possibility of conditional branches in a warp, where each thread may or may not branch based on the results of a branch condition, and because evaluation of the branch condition may be based on data values that may be different for each thread, the threads evaluating the branch condition may reach a different branch condition result and may diverge. Such divergent execution may be referred to as “divergent control flow” herein.
Because all threads in a warp typically execute the same instruction, execution of a program with divergent control flow involves execution on all control flow paths that each thread follows. Execution on all control flow paths in this manner may involve execution down multiple paths where some threads are “active,” (currently executing) while other threads are “inactive” (waiting to execute). Execution down multiple paths may cause (and typically does cause) the execution time of the entire warp to be longer than the execution time spent on any one single thread. Techniques exist for determining which divergent threads should execute at which time. However, some existing techniques may not be tied to prioritized flow, may not schedule threads efficiently or may not ensure early re-convergence of threads.
As the foregoing illustrates, more effective techniques are needed for managing the execution of threads within a warp throughout different regions of a program.