The present invention relates to the field of computer graphics. Many computer graphic images are created by mathematically modeling the interaction of light with a three dimensional scene from a given viewpoint. This process, called rendering, generates a two-dimensional image of the scene from the given viewpoint, and is analogous to taking a photograph of a real-world scene.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general purpose central processing unit (CPU) and the graphics processing subsystem. Typically, the CPU performs high level operations, such as determining the position, motion, and collision of objects in a given scene. From these high level operations, the CPU generates a set of rendering commands and data defining the desired rendered image or images. For example, rendering commands and data can define scene geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The graphics processing subsystem creates one or more rendered images from the set of rendering commands and data.
A typical graphics processing subsystem includes a graphics processing unit having one or more execution units and one or more texture units. Among other tasks, execution units are responsible for processing the geometry and lighting information. Texture units perform texture mapping of scene geometry by retrieving texel data from texture maps stored in memory. Texel data is combined with pixel data produced by the execution units to determine a color value of pixels of the rendered image.
Execution units and texture units typically have different obstacles to surmount to maximize performance. Execution units typically have a deep, largely fixed processing pipeline, which makes pipeline stalls for typical execution units very expensive in terms of performance. To reduce pipeline stalls, rendering applications are often divided into numerous independent execution threads to maximize the utilization of execution units.
In contrast, the primary performance bottleneck for texture units is memory latency arising from retrieving texel data. This bottleneck is exacerbated by the tendency of the execution units to issue batches of texture commands together, rather than distributing texture commands over time. With multiple threads running on multiple execution units, the irregular timing of texture commands can seriously degrade texture unit performance.
To even out the bursts of texture commands, a buffer, for example a First-In-First-Out buffer (FIFO), can be used to queue texture commands sent to texture units. However, texture commands often include a large amount of associated data. For example, a typical texture command and its associated data may be well over 100 bits of data. A FIFO of this width consumes a large amount of circuit area in a graphics processing unit, decreasing the amount of area available for other features.
It is therefore desirable for a graphics processing system to queue texture commands efficiently and without using large FIFOs. It is further desirable for the texture queuing mechanism to scale efficiently when used with multiple execution units.