Field of the Invention
Embodiments of the present invention relate generally to compute processing and, more specifically, to confluence analysis and loop fast-forwarding for improving SIMD execution efficiency.
Description of the Related Art
Under a single-instruction-multiple-data (SIMD) processing model, a processor processes a single instruction across multiple items of data. Multiple execution units typically exist in a SIMD processor, each of which executes a different thread associated with a different data item. During execution the multiple threads may “diverge” when, for example, the threads encounter a conditional branch instruction. The condition of such a branch may be based on thread-specific data, in which case some threads may evaluate the branch condition in one way and other threads evaluate the condition in a different way. Because of the SIMD nature of the processor these different control flow paths are oftentimes executed at different times. When threads execute different control flow paths in this manner, the threads are said to be “divergent.” As a general matter, SIMD processors experience higher processing efficiency when the threads do not diverge, as more data is processed simultaneously in such instances.
Several techniques exist for causing threads that have diverged to reconverge. One common technique is referred to as immediate-post-dominator reconvergence (“IPDOM reconvergence”). In this approach, threads that diverge between a dominator and an immediate-post-dominator are caused to reconverge when all threads arrive at the immediate-post-dominator. A first node is a dominator of a second node if a thread that executes the second node also executes the first node. A first node is a post-dominator of a second node if a thread that executes the first node also executes the second node. A first node is an immediate-post-dominator of a second node if the first node post-dominates the second node and does not post-dominate any other post-dominators of the second node. Threads can be caused to reconverge in this situation because all threads that execute the dominator must also execute the post-dominator.
One drawback to IPDOM reconvergence is that IPDOM reconvergence is usually applicable only when a dominator/immediate-post-dominator pair exists. However, in many divergent thread processing scenarios where thread reconvergence is desired, such a node pair does not exist.
As the foregoing illustrates, what is needed in the art is a more effective technique for causing divergent threads to reconverge in parallel execution environments.