The present invention relates to the field of computer graphics. Many computer graphic images are created by mathematically modeling the interaction of light with a three dimensional scene from a given viewpoint. This process, called rendering, generates a two-dimensional image of the scene from the given viewpoint, and is analogous to taking a photograph of a real-world scene.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general purpose central processing unit (CPU) and the graphics processing subsystem. Typically, the CPU performs high level operations, such as determining the position, motion, and collision of objects in a given scene. From these high level operations, the CPU generates a set of rendering commands and data defining the desired rendered image or images. For example, rendering commands and data can define scene geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The graphics processing subsystem creates one or more rendered images from the set of rendering commands and data.
Graphics processing subsystems typically use a stream-processing model, in which input elements are read and operated on by successively by a chain of stream processing units. The output of one stream processing unit is the input to the next stream processing unit in the chain. Typically, data flows only one way, “downstream,” through the chain of stream processing units. Examples of stream processing units include vertex processors, which process two- or three-dimensional vertices, rasterizer processors, which process geometric primitives defined by sets of two- or three-dimensional vertices into sets of pixels or sub-pixels, referred to as fragments, and fragment processors, which process fragments to determine their color and other attributes.
Typically, the rendering commands and data sent to the graphics processing subsystem define a set of geometric primitives that are potentially visible in the final rendered image. The set of potentially visible geometric primitives is typically much larger than the set of geometric primitives actually visible in the final rendered image. To improve performance, the graphics processing subsystem can perform one or more visibility tests to determine the potential visibility of geometric primitives. Using the results of these tests, the graphics processing subsystem can remove, or cull, geometric primitives that are not visible from the set of potentially visible geometric primitives, thereby reducing the number of geometric primitives to be rendered.
Previously, visibility testing and culling of geometric primitives, referred to as culling operations, were performed in the setup and rasterization units of the graphics processing subsystem. As rendered scenes become more complex, they typically include a large number of small geometric primitives. The increasing number of geometric primitives tends to create processing bottlenecks in the setup unit. Additionally, the vertices associated with each geometric primitive can include a set of attributes used for rendering. The bandwidth required to communicate vertices and their associated attributes from the vertex processing unit to the setup unit creates further processing bottlenecks. This problem is exacerbated by the increasing number of attributes associated with vertices to perform complex rendering operations.
It is therefore desirable to perform culling operations as soon as possible in the graphics processing subsystem to decrease wasteful rendering operations, to reduce the bandwidth requirements for communicating vertices and associated attributes, and to improve rendering performance. Additionally, it is desirable to reduce the bandwidth required for communicating attributes associated with the vertices of a primitive. It is further desirable to reduce processing bottlenecks in the setup unit without substantially increasing the complexity of other portions of the graphics processing subsystem.