Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a point, line, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform 3-D graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data, where the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives (e.g., comprising “points”, “lines”, “triangles”, etc.) and produce real-time rendered 3-D images.
A problem exists however with the ability of prior art 3-D rendering architectures to scale to handle the increasingly complex 3-D scenes of today's applications. Computer screens now commonly have screen resolutions of 1920×1200 pixels or larger. Traditional methods of increasing 3-D rendering performance, such as, for example, increasing clock speed, have negative side effects such as increasing power consumption and increasing the heat produced by the GPU integrated circuit die.
One traditional method for increasing 3-D rendering performance involves compression schemes that reduce the bandwith required between graphics memory and the GPU. For example, a significant compression of data that must be written and read to/from local graphics memory yields a corresponding significant increase in the effective data transfer bandwith between the GPU and its graphics memory.
For example, some conventional GPUs compress depth values, or z values, prior to writing them into the z buffer and decompress the z values after reading them from the z buffer. The compression and decompression operations require additional overhead computations, beyond the typical z data processing (e.g., to perform hidden surface removal, etc.). The additional computations may require dedicated logic used only for that purpose or the additional computation may be performed using general purpose logic. In either case, the performance or efficiency of the GPU and the ability of the GPU architecture to scale as graphics applications require is negatively impacted.
More particularly, in a case where a given computer system permits compression of tiles (e.g., groups of pixels) containing multiple primitives, when a new primitive is received, it can be compressed in a multi-primitive format. The multi-primitive compression is expensive in terms of processing cycles. If the primitives are too small, as more primitives are rendered, eventually the tile will no longer be able to be compressed and it will need to be stored in memory uncompressed. Since each update of a compressed, partially-covered tile requires a read of the previously compressed data (⅛ to ¼ of the full, uncompressed tile size), plus a write of the compressed data, this is more expensive than just writing the uncompressed data. This leads to the consequence that, if the tile ends up uncompressed in the end, it would have been more efficient to decompress the tile as soon as possible, and not wait until the tile bursts the limit of the compressed representation.
Thus, a need exists for compression and decompression methods that reduce the bandwidth used by a graphics processor accessing graphics memory and minimizes the number of additional overhead computations needed to support compression/decompression.