Current graphics data processing includes systems and methods developed to perform specific operations on graphics data such as, for example, linear interpolation, tessellation, rasterization, texture mapping, depth testing, etc. Traditionally, graphics processors used fixed function computational units to process graphics data; however, more recently, portions of graphics processors have been made programmable, enabling such processors to support a wider variety of operations for processing vertex and fragment data.
To further increase performance, graphics processors typically implement processing techniques such as pipelining that attempt to process in parallel as much graphics data as possible throughout the different parts of the graphics pipeline. Graphics processors with SIMD architectures are designed to maximize the amount of parallel processing in the graphics pipeline. In a SIMD architecture, the various threads attempt to execute program instructions synchronously as often as possible to increase processing efficiency.
A problem typically arises, however, when the program includes branches, and some threads want to execute the branch, but others do not. In some prior art systems, all threads are dragged through each branch, regardless of whether the threads execute the instructions associated with that branch. Given that system may execute upwards of 800 threads, such a design is quite inefficient since hundreds of threads may be needlessly dragged through a branch. Other prior art systems disable all threads that do not execute a branch. Again, such a design is inefficient since hundreds of threads may be disabled while the branch is executed.
Accordingly, what is needed in the art is a more efficient branching algorithm for systems with SIMD architectures.