The present invention relates to the field of computer graphics. Many computer graphic images are created by mathematically modeling the interaction of light with a three dimensional scene from a given viewpoint. This process, called rendering, generates a two-dimensional image of the scene from the given viewpoint, and is analogous to taking a photograph of a real-world scene.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general purpose central processing unit (CPU) and the graphics processing subsystem. Typically, the CPU performs high level operations, such as determining the position, motion, and collision of objects in a given scene. From these high level operations, the CPU generates a set of rendering commands and data defining the desired rendered image or images. For example, rendering commands and data can define scene geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The graphics processing subsystem creates one or more rendered images from the set of rendering commands and data.
Graphics processing subsystems typically use a stream-processing model, in which input elements are read and operated on by successively by a chain of stream processing units. The output of one stream processing unit is the input to the next stream processing unit in the chain. Typically, data flows only one way, “downstream,” through the chain of stream processing units. Examples of stream processing units include vertex processors, which process two- or three-dimensional vertices, rasterizer processors, which process geometric primitives defined by sets of two- or three-dimensional vertices into sets of pixels or sub-pixels, referred to as fragments, and fragment processors, which process fragments to determine their color and other attributes.
Many graphics processing subsystems are highly programmable, enabling implementation of, among other things, complicated lighting and shading algorithms. In order to exploit this programmability, applications can include one or more graphics processing subsystem programs, which are executed by the graphics processing subsystem in parallel with a main program executed by the CPU. Although not confined to merely implementing shading and lighting algorithms, these graphics processing subsystem programs are often referred to as shading programs or shaders.
Each programmable stream processing unit can be adapted to execute its own separate shading program in parallel with shading programs executing on other stream processing units. Implementations of complicated algorithms often depend on separate shading programs tailored to each stream processing unit working together to achieve the desired result. In these implementations, outputs of shading programs for initial stream processing units in a chain may be linked with the inputs of shading programs for subsequent stream processing units in the chain.
The programmable fragment processor is often the bottleneck in improving rendering performance. Typically, the programmable fragment processor must execute its shading program once for each fragment rendered. With fragment shading programs including hundreds or thousands of instructions and each rendered image generated by millions of fragments, the computational requirements of the fragment processor are enormous.
It is therefore desirable to a graphics processing to have a programmable fragment processor having improved performance. It is further desirable that the programmable fragment processor be easily and efficiently scalable to meet different cost and performance targets.