In processing data to generate an image, graphics processor (GPU) performance and power consumption is directly related to the choice of input topology used to model the objects presented in an image. In the present day, graphics processors typically render images using triangles as primitives. A major factor that defines the efficiency of image generation is the number of vertices required to describe the scene, e.g., the average number of vertices required to define a triangle. This is based on several factors, but is primarily due to the fact that GPUs typically transmit one vertex along a fixed-function geometry pipeline every clock. The practice of passing multiple vertices down the geometry pipeline in one clock is generally prohibitive with respect to semiconductor die area and power consumption. This is due to the amount of information associated with each vertex, the considerable length of the fixed-function geometry pipeline, and the complexity of processing input topologies to packets of a fixed number of vertices. In addition, the ability to process multiple vertices per clock is not always required, and therefore such practice may unnecessarily consume power and processor real estate.
Because of these considerations, the use of triangle strips to render images provides distinct advantages. A triangle strip is a series of connected triangles that share vertices, where each new vertex implicitly defines a new triangle. Triangle strips are used to accelerate the rendering of objects represented as triangle meshes. If the triangle strip (tristrip) topology is used, except for the first two vertices, each subsequent vertex defines a complete triangle. This contrasts with the so-called triangle list (trilist) topology, which renders each triangle separately and thereby requires three vertices to define a single triangle. Thus, in principle, the number of vertices sent to the GPU to define n triangles in a mesh may be reduced from 3n to n+2 in the best case.
Nonetheless, current graphics processing applications including 3D graphics almost exclusively use trilist input topologies even though there is generally a high-level of connectivity between triangles in the trilists. There are several reasons for the persistence of trilist topology including but not limited to the following: 1) tristrip topologies only allow subsequent triangles to connect to the strip at the last submitted edge and this restrictive adjacency limits their usefulness in modeling complex shapes; 2) tristrip topologies tend to be rather short (few vertices) when used to directly model objects, which may yield poor performance due to driver and GPU overhead Draw-Call overhead and the limited opportunity to amortize this overhead over the few triangles typically generated within each object of the topology; 3) Historically, 3D application programming interfaces (APIs) did not directly support packing multiple, variable-length tristrips in a draw call.
On the other hand, use of trilist topologies typically lead to performance and power consumption issues, including but not limited to the following: 1) Most GPUs incorporate VertexShader (VS) cache in order to limit redundant vertex shading. Although vertices in a trilist topology with significant spatial coherency benefit from this VS cache, each triangle requires three VS cache lookups that consume power; 2) vertices that hit in the VS cache still need to be buffered within the vertex shader stage until the shading of all preceding “miss” vertices has completed, which buffering consumes die area and power; 3) Vertices that hit in the VS cache also require updates to the corresponding vertex reference counts to account for the additional vertex references sent down the pipeline, which further consumes power; 4) Each vertex passed down the pipeline consumes some amount of dynamic power due to buffering within/between stages, etc; and 5) Following the vertex shading stage of the pipeline, complete triangles need to be assembled for per-triangle operations such as clip-testing, cull-testing and triangle setup stages; 6) Finally, as noted use of trilist topologies lead to maximum processing rate of one triangle per three clocks as vertices arrive at a rate of one triangle per clock.
Given the tradeoffs mentioned above there may be a need for improved techniques and apparatus to solve these and other problems.