1. Field of the Invention
The present invention relates generally to optimizing shading operations and memory traffic in a graphics processing unit.
2. Background
A graphics processing unit (GPU) is a complex integrated circuit that is specially designed to perform graphics processing tasks. A GPU can, for example, execute graphics processing tasks required by an end-user application, such as a video game. Some currently available GPUs use a technique known as pipelining to execute graphics instructions. Pipelining enables a GPU to work on different steps of an instruction at the same time, thus taking advantage of parallelism that exists among steps required to execute the instruction. As a result, pipelining streamlines graphics processing by enabling the GPU to execute more instructions in a shorter period of time.
Modern graphics pipelines accept input in a variety of formats, but the most widely used representation for geometry is based on vertex and index buffers. The vertex buffer provides three-dimensional coordinates and attributes for a set of vertices. The index buffer defines a set of geometric primitives. A commonly used geometric primitive is a triangle, where vertex information of the triangle is stored in the vertex buffer.
As each triangle is processed by the GPU for rendering, vertices of the triangle are processed by a vertex shader. This operation can be computationally expensive due to the combination of the arithmetic logic unit (ALU) instructions required to process each vertex (e.g., transform and lighting) and the computing bandwidth required to load the processed vertex data (e.g., position, color, and texture coordinates) into memory. Oftentimes, the processed vertex data is loaded into a memory device.
Another potential bottleneck in the graphics pipeline exists during geometry shading. Geometry shading occurs after vertex shading and enables the GPU to add geometric detail to an existing primitive by generating (or “emitting”) new primitives. Alternatively, the geometry shader can emit zero primitives. The primitives emitted from the geometry shader are then rasterized, where fragments (or pixels) from the rasterization process are ultimately passed to a pixel shader for further processing.
Rendering costs during geometry shading are attributed to the ALU instructions required to process each primitive, the computing bandwidth associated with vertex lookup from the vertex shader's memory, and the computing bandwidth required to load the processed data into memory. Oftentimes, similar to the vertex shader, data processed by the geometry shader data is loaded into a memory device.
Rendering costs attributed to the vertex and geometry shaders can dominate the total rendering cost of the graphics pipeline. For some applications, the rendering costs attributed to the vertex and geometry shader may be unavoidable due to design constraints such as performance and circuit area of the GPU. For instance, for a high volume of primitive data entering the graphics pipeline, the geometry shader may emit a large number of primitives for each vertex processed by the vertex shader. Here, off-chip memory may be required such that latency periods between the vertex shader and geometry shader operations can be hidden. In the alternative, potential latency issues can be resolved by adding on-chip memory to the GPU, thus increasing overall circuit area. However, an increase in circuit area of the GPU is counterbalanced by manufacturing costs associated with the fabrication of a larger graphics chip. Thus, in light of performance and cost factors, off-chip memory may be a feasible solution for many GPU designs.
In situations where a low volume of primitive data enters the graphics pipeline and the geometry shader does not emit a large number of primitives for each vertex processed by the vertex shader, then the loading of processed data into, and retrieving of the data from, off-chip memory can be inefficient. Here, a latency period between the vertex shader and geometry shader operations can increase (relative to processing time) due to a small number of primitives being emitted from the geometry shader. In other words, since the geometry shader emits a small number of primitives, the computing time associated with retrieving data from the vertex shader's memory and loading data processed by the geometry shader into off-chip memory is more than the computing time for the geometry shader to emit primitives. As the geometry shader processes more vertex data with a small number of emits, the latency period of the geometry shader's operation increases, thus increasing the overall rendering cost of the graphics pipeline.
Accordingly, what is needed is an improved method to reduce the rendering costs attributed to the vertex and geometry shaders by eliminating the need to access memory when processing vertex information.