The present invention relates in general to computer graphics, and in particular to culling of invisible primitives in a vertex processing unit.
Many computer generated images are created by mathematically modeling the interaction of light with a three-dimensional (3D) scene from a given viewpoint and projecting the result onto a two-dimensional (2D) “screen.” This process, called rendering, generates a 2D image of the scene from the given viewpoint and is analogous to taking a digital photograph of a real-world scene.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general-purpose central processing unit (CPU) and a graphics processing subsystem. Typically, the CPU performs high-level operations, such as determining the position, motion, and collision of objects in a given scene. From these high-level operations, the CPU generates a set of rendering commands and data defining the desired rendered image (or images). For example, rendering commands and data can define scene geometry by reference to “primitives,” which are usually points, lines triangles or other simple polygons; complex objects are defined as groups of primitives. A primitive is typically defined as a group of vertices, with each vertex having attributes such as color, world space coordinates, texture-map coordinates, and the like, and the same vertex may be part of multiple primitives. Rendering commands and data can also define other parameters for a scene, such as lighting, shading, textures, motion, and/or camera position. From the set of rendering commands and data, the graphics processing subsystem creates one or more rendered images.
Graphics processing subsystems typically use a stream, or pipeline, processing model, in which input elements are read and operated on successively by a chain of processing units. The output of one processing unit is the input to the next processing unit in the chain. A typical pipeline includes vertex processors, which generate attribute values for the 2D or 3D vertices; setup processors, which create parameterized attribute equations for all points in each primitive; rasterizers, which determine which particular pixels or sub-pixels (also referred to herein as fragments) are covered by a given primitive; and fragment processors, which determine the color and other attributes of each fragment based in part on the parameterized attribute equations created by the setup processor. Typically, data flows only one way, “downstream,” through the chain of units, although some processing units may be operable in a “multi-pass” mode, in which data that has already been processed by a given processing unit can be returned to that unit for additional processing.
Typically, the rendering commands and data sent to the graphics processing subsystem define a set of primitives that might or might not be visible in the final rendered image. To improve performance, the graphics processing subsystem or the CPU can perform one or more visibility tests to determine the potential visibility of primitives. For instance, primitives that are behind the viewpoint, too small, too distant, or oriented away from the viewpoint are generally identified as invisible using well-known tests. The graphics processing subsystem or the CPU can remove, or cull, primitives that fail the visibility test from the set of potentially visible geometric primitives, thereby reducing the number of primitives to be rendered.
Visibility testing and culling of primitives, referred to as culling operations, are performed in the setup processors and/or rasterizers of conventional graphics processing subsystems. This approach, however, is inadequate in at least some situations. As rendered scenes become more complex, they typically include a larger number of primitives and therefore a larger number of vertices. Processing bottlenecks can occur, for instance, if the graphics subsystem does not provide sufficient bandwidth to communicate all of the vertices and their associated attributes from the vertex processing unit to the setup unit. In addition, as rendering techniques become increasingly sophisticated, the number of attributes associated with each vertex tends to increase, as does the complexity of the computations used to generate attribute values for each vertex. Thus, processing bottlenecks can also occur within a vertex processing unit, where considerable processing power can be spent computing attributes for vertices that are not part of any visible primitive and therefore have no effect on the final image.
In some computer systems, these bottlenecks have been partially alleviated by moving some of the culling operations from the graphics subsystem into the CPU. For instance, the CPU can perform “backface” culling, which involves identifying and culling primitives that face away from the viewpoint and are therefore not visible. Such culling can be performed by a graphics driver or other suitable program executing on the CPU. When the CPU culls a primitive, an instruction to process that primitive is not sent to the graphics subsystem, reducing the burden on the graphics subsystem. In instances where the processing bottlenecks in the graphics subsystem result in idle time in the CPU, culling in the CPU can improve overall system performance.
Nonetheless, CPU culling is at best a partial solution. While some types of culling can be handled by the CPU, many culling operations are handled more efficiently within the graphics subsystem. Further, culling in the CPU can divert CPU cycles from high-level rendering operations, processing of user input, and the like, which can impair overall system performance.
It is therefore desirable to perform culling operations in the graphics subsystem rather than in the CPU. It is further desirable to cull as early as possible in the graphics pipeline, in order to decrease wasteful rendering operations, reduce the bandwidth requirements for communicating vertices and associated attributes, and improve rendering performance.