Field of the Invention
The present invention generally relates to single-instruction multiple-thread (SIMT) execution and more specifically to optimization of temporal SIMT execution.
Description of the Related Art
Conventional SIMT multithreaded processors provide parallel execution of multiple threads by organizing threads into groups and executing each thread on a separate processing pipeline. An instruction for execution by the threads in a group dispatches in a single cycle. The processing pipeline control signals are generated such that all threads in a group perform a similar set of operations as the threads traverse the stages of the processing pipelines. For example, all the threads in a group read source operands from a register file, perform the specified arithmetic operation in processing units, and write results back to the register file.
When divergence is permitted between different threads in the same group some of the parallel processing pipelines are idle while threads that take the branch are executed and the remaining parallel processing pipelines are idle while the threads that did not take the branch are executed. The utilization of the parallel processing pipelines may be significantly reduced when execution of threads in a group diverges. In the worst case, only a single thread is dispatched for execution on the parallel processing pipelines.
Accordingly, what is needed in the art is an improved system and method for utilizing processing resources in a multithreaded processing architecture when threads may diverge.