A graphics processing unit (GPU) is a dedicated graphics rendering device used to generate computerized graphics for display on a display device. GPUs are built with a highly-parallel structure that provides more efficient processing than typical, general purpose central processing units (CPUs) for a range of complex algorithms. For example, the complex algorithms may correspond to representations of three-dimensional computerized graphics. In such a case, a GPU can implement a number of primitive graphics operations to create three-dimensional images for display on a display device more quickly than using a CPU to draw the image for display on the display device.
A typical GPU receives an image geometry and uses a pipeline approach to generate graphics which can be output, for example, for display on a display device. A typical graphics pipeline includes a number of stages which operate in parallel, with the output from one stage possibly being used at another stage in the pipeline. For example, a typical graphics pipeline comprises vertex shader, primitive assembly, perspective projection and viewport transformation, primitive setup, rasterization, hidden primitive and pixel rejection, attribute setup, attribute interpolation and fragment shader stages.
A vertex shader is applied to the image geometry for an image and generates vertex coordinates and attributes of vertices within the image geometry. Vertex attributes include, for example, color, normal, and texture coordinates associated with a vertex. Primitive assembly forms primitives, e.g., point, line, and triangle primitives, from the vertices based on the image geometry. Formed primitives can be transformed from one space to another using a perspective projection and viewport transformation which transforms primitives from a normalized device space to a screen space. Primitive setup can be used to determine a primitive's area, edge coefficients, and perform occlusion culling (e.g., backface culling), and 3-D clipping operations.
Rasterization converts primitives into pixels based on the XY coordinates of vertices within the primitives and the number of pixels included in the primitives. Hidden primitive and pixel rejection use the z coordinate of the primitives and/or pixels to determine and reject those primitives and pixels determined to be hidden (e.g., a primitive or pixel located behind another primitive or pixel in the image frame). Attribute setup determines attribute gradients, e.g., a difference between the attribute value at a first pixel and the attribute value at a second pixel within a primitive moving in either a horizontal (X) direction or a vertical (Y) direction, for attributes associated with pixels within a primitive. Attribute interpolation interpolates the attributes over the pixels within a primitive based on the determined attribute gradient values. Interpolated attribute values are sent to the fragment shader for pixel rendering. Results of the fragment shader can be output to a post-processing block and a frame buffer for presentation of the processed image on the display device.
Coordinates and attributes of vertices are used at multiple processing stages along the GPU pipeline. For example, coordinates associated with each primitive are used at the primitive assembly, perspective projection and viewport transformation, primitive setup, rasterization and hidden primitive and pixel rejection stages. FIG. 1 provides an example of triangle primitives, and the vertices of each. Triangle 1 of FIG. 1 has vertices 0, 1, and 2, triangle 2 has vertices 3, 4, and 5, and triangle N has vertices 3*N−3, 3*N−2 and 3*N−1. In the example, the number of vertices that are operated on in the GPU pipeline can be as many as three times the number of triangles, with each vertex having multiple associated coordinates and attributes. In such a case, the “coordinates-processing” stages operate on multiple coordinates for each of the three vertices for a given triangle. Similarly and in a case that a triangle is used as the primitive, the “attribute-processing” stages operate on multiple attributes for each vertex for a given triangle. The GPU pipeline therefore processes a large amount of data at a given stage, and a large amount of data is being moved from one stage to another.