1. Field of the Invention
Embodiments of the present invention relate generally to graphics processing and, more specifically, to primitive re-ordering between world-space and screen-space pipelines with buffer limited processing.
2. Description of the Related Art
Some graphics subsystems for rendering graphics images implement a tiling architecture, where one or more render targets, such as a frame buffer, are divided into screen space partitions referred to as tiles. In such a tiling architecture, the graphics subsystem rearranges work such that the work associated with any particular tile remains in an on-chip cache for a longer time than with an architecture that does not rearrange work in this manner. This rearrangement helps to improve memory bandwidth as compared with a non-tiling architecture.
Typically, the set of render targets changes over time as the rendering of the image progresses. For example, a first pass could use a first configuration of render targets to partially render the image. A second pass could use a second configuration of render targets to further render image. A third pass could use a third set of render targets to complete the final rendering of the image. During the rendering process, the computer graphics subsystem could use any number of different render target configurations to render the final image.
For each render target configuration, graphics objects are first processed in a world space pipeline. The world space pipeline creates graphics primitives associated with the graphics objects. The graphics primitives are created and transmitted by the world space pipeline without regard to the position of the graphics primitives in the screen surface represented by the render targets. The graphics subsystem rearranges the graphics primitives into tile order, where each tile represents a portion of the screen surface. The rearranged graphics primitives are then processed by the screen space pipeline while maintaining application programming interface (API) order.
Although memory allocated for storing tiles is generally designed to hold all the needed graphics primitives for a given render target configuration, certain conditions may cause this tile memory to run out of space. For example, a particular tile could include a large number of very small primitives, such as when one or more graphics objects are finely tessellated. In such cases, the tiling memory could fill with graphics primitives before the entire image is processed in the world space pipeline. In addition, other data associated with the graphics primitives, such as vertex attribute data, may be stored in a general purpose cache. In certain cases, the cache may fill with vertex attribute data or other data associated with the graphics primitives, causing the graphics primitives to be evicted from the tiling memory and the vertex data or other data associated with the graphics primitives to be evicted from the cache. Such evicted data may be written to frame buffer memory and later retrieved.
One drawback to the above approach is that the frame buffer memory is generally off-chip; whereas, the tiling memory and cache memory are generally on-chip. Off-chip memory accesses typically consume more power and take longer to complete. Such increased power consumption may result in shorter battery life, particularly for graphics subsystems placed in mobile devices. In addition, as off-chip accesses to frame buffer memory increase, rendering time increases, resulting in lower graphics performance and reduced visual experience.
As the foregoing illustrates, what is needed in the art is a technique for reducing off-chip memory accessed in graphics subsystem that employs tiling architectures.