Many computer graphic images are created by mathematically modeling the interaction of light with a three-dimensional scene from a given viewpoint. This process, called rendering, generates a two-dimensional image of the scene from the given viewpoint.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general purpose central processing unit (CPU) and the graphics processing subsystem. Typically, the CPU performs high level operations, such as determining the position, motion, and collision of objects in a given scene. From these high level operations, the CPU generates a set of rendering commands and data defining the desired rendered image or images. For example, rendering commands and data can define scene geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The graphics processing subsystem creates one or more rendered images from the set of rendering commands and data.
Many graphics processing subsystems are programmable, enabling implementation of, among other things, complicated lighting and shading algorithms. To exploit this programmability, applications can include one or more graphics processing subsystem programs, which are executed by the graphics processing subsystem in parallel with a main program executed by the CPU. Although not confined to merely implementing shading and lighting algorithms, these graphics processing subsystem programs are often referred to as shader programs or shaders.
Graphics processing subsystems typically use a stream processing model, in which input elements are read and operated on successively by a chain of stream processing units. The output of one stream processing unit is the input to the next stream processing unit in the chain. Typically, data flows only one way (“downstream”) through the chain of stream processing units. Examples of stream processing units include vertex processors that process two-dimensional or three-dimensional vertices, rasterizer processors that process geometric primitives defined by sets of two-dimensional or three-dimensional vertices into groups of pixels or sub-pixels referred to as fragments, and fragment processors that process fragments to determine their color and other attributes.
The programmable fragment processor is often the bottleneck in improving rendering performance. Typically, the programmable fragment processor must execute its shader program once for each fragment rendered. With fragment shader programs including hundreds or thousands of instructions and each rendered image generated by millions of fragments, the computational requirements of the fragment processor can be very large.