1. Field of the Invention
The present invention relates generally to the field of graphics processing and, more specifically, to culling before vertex attribute fetch and vertex lighting.
2. Description of the Related Art
A graphics processing pipeline of a graphics processing unit (GPU) accepts a representation of a three-dimensional (3D) scene as an input and processes that input to produce a 2D display image of the scene as an output. As is well known, the 3D graphics scene is typically represented by a collection of primitives having vertices. Indices associated with the vertices are stored in index arrays, and vertex data associated with those vertices is stored in vertex arrays. The primitives are individually processed by the GPU based on the index arrays and the vertex data when generating the 2D display image of the scene.
FIG. 1 is a conceptual diagram different stages in a graphics processing pipeline 100 of a GPU through which the primitives associated with a graphics scene are processed when generating the 2D display image of the graphics scene. The graphics processing pipeline 100 includes a host unit 102, a front end unit 104, an index fetch unit 106, a vertex fetch unit 108, a vertex shader 110 and a geometry shader 112. The graphics processing pipeline 100 also includes a viewport cull (VPC) unit 114, a rasterizer 116, a pixel shader 118, a raster operations unit (ROP) 120 and a frame buffer 122.
The host unit 102 transmits the vertex data and the index arrays associated with the vertices of the various primitives making up the 3D graphics scene to an L2 cache within the GPU or the frame buffer 122 for storage. The host unit 102 also transmits graphics commands for processing the vertex data associated with those vertices to the front end unit 104, which, in turn, distributes graphics processing commands to the index fetch unit 106. For a given set of vertices being processed, the index fetch unit 106 retrieves the index arrays associated with those vertices and creates a batch of vertices selected for processing. The index fetch unit 106 then transmits the batch of unique vertices to the vertex fetch unit 108.
Upon receiving a batch of vertices, the vertex fetch unit 108 fetches the vertex attributes included in the vertex data associated with each vertex in the batch of vertices from the frame buffer 102. The vertex fetch unit 108 transmits the vertex attributes to the vertex shader 110 for further processing. The vertex shader 110 is a programmable execution unit that is configured to execute vertex shader programs for lighting and transforming vertices included in the batch of vertices. The vertex shader 110 transmits the processed batch of vertices to the tessellation control shader 111.
The tessellation control shader (TCS) 111 operates on a patch of control vertices and computes vertex attributes for each control vertex of the patch. The TCS 111 also produces a set of level of details (LODs) associated with the patch that can be used to generate a tessellated surface. The tessellation evaluation shader (TES) 112 operates on the vertices of the tessellated surface and computes vertex attributes for each vertex of the tessellated surface. The vertices are then processed by the geometry shader 113. The geometry shader 113 is a programmable execution unit that is configured to execute geometry shader programs for generating graphics primitives and calculate parameters that are used to rasterize the graphics primitives. The geometry shader 113 then transmits the generated primitives to the viewport cull unit 114.
The viewport cull unit 114 performs clipping, culling, viewport transform, and attribute perspective correction operations on the primitives. The viewport cull unit 114 can perform different culling and clipping techniques to remove primitives within the 3D graphics scene that are not visible in the view frustum. The remaining primitives (those that are not culled) are transmitted by the viewport cull unit 114 to the rasterizer 116. The rasterizer 116 rasterizes the remaining primitives into pixels in 2D screen space and then the pixel shader 118 shades the pixels. The ROP 120 is a processing unit that performs raster operations, such as stencil, z test, blending, and the like, and outputs pixel data as processed graphics data for storage in graphics memory. The processed graphics data may be stored in graphics memory, e.g., the frame buffer 122 for further processing and/or display.
One drawback of conventional graphics processing pipelines, like graphics processing pipeline 100, is that all the vertex attributes associated with a batch of vertices being processed are retrieved from the frame buffer 122, even if the primitives associated with the batch of vertices are later culled downstream of the vertex fetch unit 108 by the viewport cull unit 114. In such a scenario, memory bandwidth is wasted unnecessarily to retrieve vertex attributes for vertices that are discarded at a later stage in the graphics processing pipeline. Similarly, the vertex shader 110 and the geometry shader 112 process vertex data associated with all the vertices in a batch even if the primitives associated with that batch of vertices are later culled by the viewport cull unit 114, thereby wasting the processing resources of the GPU.
As the foregoing illustrates, what is needed in the art is a mechanism for identifying vertices that are eventually culled in a later stage of the graphics processing pipeline and filtering those vertices at an earlier stage in the graphics processing pipeline before those vertices are processed.