Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to an apparatus and method for dynamic polygon or primitive sorting for improved culling.
Description of the Related Art
Reducing power dissipation and increasing energy efficiency in graphics architectures is currently a very important goal, and will be for the foreseeable future. To that end, it is vital to save power at most/all stages in the graphics pipeline including the fixed function units and execution units.
One of the most efficient techniques for increasing performance and energy efficiency in a graphics pipeline is the use of culling, where unnecessary work is discarded at an early stage. For instance, an object that is hidden behind other objects, or outside the view frustum, would not contribute to the final image, and would therefore not have to be fed through the graphics pipeline. The earlier the culling is performed in the pipeline the better, since more work can be avoided.
Today, models and geometry in the form of polygons (e.g., triangles) are basically rendered by the hardware in the same order as which they are submitted. Additionally, graphics application programming interfaces (APIs) such as Direct3D or OpenGL enforce strict ordering semantics; it must appear as if graphics commands are executed in the order in which they are submitted. However, rendering order is permitted to be changed as long as it does not violate the semantics.
The most efficient way of rendering on modern graphics architectures is to submit triangles in a front-to-back ordering in which the triangle closest to the viewer should be submitted first, followed by the second closest, and so on. If this is done, the z-buffer algorithm efficiently avoids drawing triangles that are completely hidden by other triangles, in turn reducing shading, texturing, and pixel operations to the final image. However, to manually ensure a front-to-back ordering of triangles is sometimes a complicated procedure and requires sorting of all triangles prior to submission to the GPU. This is difficult because the geometry undergoes different transformations before ending up in their sortable positions. So the overall benefit of performing this sorting on the CPU might not exceed the cost of transforming and sorting.