A graphics processing unit (GPU) is a dedicated graphics rendering device utilized to manipulate and display computerized graphics on a display device. GPUs are built with a highly-parallel structure that provides more efficient processing than typical, general-purpose central processing units (CPUs) for a range of complex algorithms. For example, the complex algorithms may correspond to representations of three-dimensional computerized graphics. A GPU may implement a number of primitive graphics operations, such as forming points, lines, and triangles, to create complex, three-dimensional images on a display device more quickly than drawing the images directly to the display device with a CPU.
Vertex shading and fragment (pixel) shading are often utilized in the video gaming industry to determine final surface properties of a computerized image, such as light absorption and diffusion, texture mapping, light relation and refraction, shadowing, surface displacement, and post-processing effects. GPUs include at least three major pipeline stages in a typical shader based graphics core: a vertex shader stage, a primitive setup and interpolation stage, and a fragment shader stage. The vertex shader and the fragment shader each maintain dedicated register file space. The shaders typically comprise Single Instruction, Multiple Data (SIMD) processors that receive inputs one by one as threads. A thread may be a group of vertices, primitives, or pixels. The shaders execute multiple threads in an interleaved manner to compensate latency.
A vertex shader is applied to an image geometry for an image and generates vertex coordinates and attributes of vertices within the image geometry. Vertex attributes include, for example, color, normal, and texture coordinates associated with a vertex. A primitive setup and rejection module will form primitives, such as points, lines, or triangles, and reject invisible primitives based on the vertices within the image geometry. An attribute setup module computes gradients of attributes within the primitives for the image geometry. Once the attribute gradient values are computed, primitives for the image geometry may be converted into pixels, and hidden primitive and pixel rejection may be performed. An attribute interpolator then interpolates the attributes over pixels within the primitives for the image geometry based on the attribute gradient values, and sends the interpolated attribute values to the fragment shader for pixel rendering. Results of the fragment shader will be output to a post-processing block and a frame buffer for presentation of the processed image on the display device.
Attributes of vertices within the image geometry are passed through each processing stage along the GPU pipeline. Therefore, the GPU pipeline must move a large amount of data and requires a wide internal data bus to meet the data throughput. Moving the large amount of data through each of the processing stages in the GPU pipeline may create a bottleneck for primitives that include large numbers of attributes. Additionally, attribute gradient setup is computationally intensive and may slow down image processing within the GPU pipeline.