In processing data to generate an image, graphics processor units (GPU) performance and power consumption is directly related to the choice of input topology used to model the objects presented in an image. In the present day, graphics processors typically render images using triangles as primitives that are organized into meshes in which the triangles are indexed. In present day processing, triangles or other primitives are subject to a series of operations in a graphics pipeline such as vertex shading, clipping, setup, and rasterization.
At the vertex shading stage, graphics processing circuitry such as graphics processor units (GPUs) take advantage of indexed organization by employing a cache in hardware, which is often referred to as a vertex cache or a vertex shader cache, whose function is to cache the results of shaded vertices. A vertex shader is used to transform the attributes of vertices of a triangle such as color, texture, position and direction from the original color space to the display space. The vertex shader may reshape or distort original objects in a desired manner.
When a vertex with the same index as a previously processed vertex is to be shaded again, e.g., because it appears in another triangle or primitive, the vertex cache is interrogated by a look-up operation. If the vertex is still present in the vertex cache, shading of that vertex is skipped altogether. This process is transparent to the user and saves execution resources and power. The vertex cache is often organized as a finite sized first-in-first-out (FIFO) buffer, such that vertex shading can be saved so long as the shaded vertex is still in the vertex cache. The hit rate of the vertex cache increases with increased cache capacity because a given shaded vertex may be preserved longer within a larger FIFO type cache before being bumped out of the cache. In this manner larger caches may be desired to perform vertex processing with a higher hit rate. However, larger cache requires more circuitry real estate, for example, a larger static random access memory (SRAM) array, which memory requires many transistors to store one bit, such as six transistors per bit in a common layout.
Given the tradeoffs mentioned above there may be a need for improved techniques and apparatus to solve these and other problems.