1. Field of the Invention
The present invention generally relates to graphics processing and more specifically to a system and method for enabling predication of synchronization commands.
2. Description of the Related Art
Current graphics data processing includes systems and methods developed to perform specific operations on graphics data such as, for example, linear interpolation, tessellation, rasterization, texture mapping, depth testing, etc. Traditionally, graphics processors used fixed function computational units to process graphics data; however, more recently, portions of graphics processors have been made programmable, enabling such processors to support a wider variety of operations for processing vertex and fragment data.
To further increase performance, graphics processors typically implement processing techniques such as pipelining that attempt to process in parallel as much graphics data as possible throughout the different parts of the graphics pipeline. Graphics processors with SIMD (single-instruction multiple-data) architectures are designed to maximize the amount of parallel processing in the graphics pipeline. In a SIMD architecture, the same instruction is executed in parallel to process multiple data inputs. A single-instruction, multiple-thread (“SIMT”) architecture provides greater flexibility than a SIMD architecture since threads in a group of threads (also referred to as a “warp”) may follow different paths through a set of instructions to process multiple data inputs. A SIMD instruction specifies the execution and branching behavior of a single control thread controlling operations on a vector of multiple data inputs. In contrast, a SIMT instruction specifies the execution and branching behavior of one individual independent thread operating on its data inputs, and a SIMT architecture applies a SIMT instruction to multiple independent threads in parallel which are free to execute and branch independently. Conditional break and return instructions in which threads may branch independently are used for advanced control flow in order to improve processing efficiency. In particular, threads that execute a break or return may complete processing earlier than threads that do not execute the break or return. Threads that have diverged during the execution of conditional control flow instructions are then synchronized so that those threads are executed in parallel.
In current SIMT architectures, synchronization of divergent threads may be realized by appending a synchronization command to an instruction that each divergent thread is currently executing. For example, a first instruction denoted “instruction1” may specify a synchronization command by appending “.S” to the instruction, resulting in “instruction1.S.” In this example, the synchronizing operation was performed before instruction1 execution and performed by the graphics processor regardless of any predication specified by the executing instruction. Such synchronization operations are costly and therefore decrease the performance of the SIMT architecture.
Accordingly, what is needed in the art is a SIMT architecture that allows predication of synchronization commands and performs the synchronizing operation after the instruction finishes executing.