Three-dimensional (3D) graphics processing systems typically utilize several memory buffers during the rendering process, including, for example, color buffers and depth buffers (often called z-buffers). These buffers often are stored in random-access memory (RAM) external to the graphics processing unit (GPU), which may have relatively small cache memories on chip. Because the buffered data may be retrieved and re-written several times during the rendering process, the available memory bandwidth (the capacity for writing data to memory and reading data from memory) must often be quite high, especially to support real-time graphics processing applications such as real-time games. On a desktop personal computer, the available memory bandwidth might be very high, perhaps several gigabytes per second. In a mobile phone, on the other hand, only several hundred megabytes per second of data transfer might be available.
Even with a high available memory bandwidth, the performance of a GPU for some applications might nonetheless be constrained by the memory bandwidth. Reducing the amount of data retrieved from and written to the external RAM is thus generally advantageous. The advantages of reducing memory transactions are particularly pronounced in mobile platforms, such as a mobile telephone, since the increased clock rates and wider data buses necessary to support very high memory bandwidths also result in increased power consumption, draining batteries more quickly. By reducing the memory bandwidth, lower clock rates and/or narrower data buses may be used, thereby reducing the power consumption.
To reduce the burden on the memory system, several different types of compression algorithms are used in conventional graphics processing systems. In a typical GPU, compression and decompression algorithms may be employed at several different places. For instance, textures may be compressed with a texture compression algorithm, such as the Ericsson Texture Compression (ETC) algorithm, which is described in “iPACKMAN: High-Quality, Low-Complexity Texture Compression for Mobile Phones” (Jacob Ström & Tomas Akenine-Möller, Graphics Hardware, pp. 63-70, 2005) and which is part of the OpenGL ES standard—the main open standard for graphics on mobile devices. As those skilled in the art will appreciate, texture compression algorithms are almost always lossy. ETC in particular compresses standard 24-bit-per-pixel (RGB888) textures down to a fixed rate of 4 bits per pixel. Those skilled in the art will also appreciate that texture compression may often be performed offline as a pre-process, so that the graphics hardware performs only texture decompression of pre-compressed texture files. In such a system, then, access to pre-compressed textures is typically read-only.
The various buffers used in 3D graphics processing, such as the color buffer, depth buffer, or stencil buffer, may also be compressed, using various algorithms. One approach to color buffer compression is described in U.S. patent application Ser. No. 11/953,339, titled “Frame Buffer Compression and Decompression Method for Graphics Rendering,” and filed Dec. 10, 2007, by Rasmusson et al., the entire contents of which are incorporated herein by reference. Several depth buffer compression algorithms are described in “Efficient Depth Buffer Compression” (Jon Hasselgren & Tomas Akenine-Möller, Graphics Hardware, September 2006). Buffer compression and decompression are very different from texture compression and decompression, since the contents of the buffer may be accessed, modified, and stored repeatedly during the rendering process. Compression is performed before shuffling data from an internal cache out to external memory. If the data can be compressed, it is sent over the bus in compressed form, thereby saving memory bandwidth. Correspondingly, when data is retrieved from the external memory, it is sent in compressed form over the bus, decompressed, and put into the internal cache. Those skilled in the art will appreciate that this process occurs at least once each time a triangle is being rendered, and may thus occur numerous times during the rendering of a single image.
Buffer compression algorithms are often exact (i.e., non-lossy), but there are exceptions to this, such as the color buffer compression algorithm described in the Rasmusson application cited above. Buffer compression algorithms are often symmetric, so that the compression and decompression operations take about the same amount of time to perform. This is important, since buffer compression is used during rendering, and a given portion of the buffer may be decompressed and compressed several times if rendering operations are performed on that portion more than once. Texture compression algorithms, on the other hand, are typically asymmetric—compressing a texture usually requires much more processing power than decompressing the resulting file. As a result, textures are conventionally pre-compressed (offline) for later use by the GPU.
In recent graphics applications, it has become more common to use a final rendered image as a texture when rendering a later image. As an example, consider the use of so-called cube maps. A cube map stores a 360-degree image of the scene as seen from some point in the center of the scene. The cube-map may be used to generate simulated reflections during real-time rendering. For a static scene, the cube map can be created as a pre-process, and stored in compressed form as a conventional texture. However, if the scene is dynamic (i.e., with moving objects, such as moving clouds), the cube map generally must be recreated for each of several displayed frames by rendering into a buffer.
In existing graphics architectures, buffer compression and texture compression operations are conceptually and physically separated. This separation makes it harder to reduce the bandwidth. For example, there are two known approaches for conventional hardware to handle the dynamic cube map rendering problem described above. The first approach is to disable compression of the cube map buffer during rendering of the cube map image. The cube map image may then be stored as an uncompressed image for subsequent use as a texture in a later rendering pass. However, this approach uses a great deal of memory bandwidth, since buffer compression is not used when rendering the cube map and texture compression is not exploited when retrieving the stored cube map image for use as a texture in the later rendering pass. A second approach is to enable buffer compression during the rendering of the cube map image, and, when the cube map image is completed, to retrieve the compressed cube map image, decompress it, and store the decompressed cube map image in memory for later use as an uncompressed texture. This approach may make slightly more effective use of memory bandwidth, since buffer compression is used during the cube map image rendering, but this approach includes an extra decompression process (of the entire cube map) and does not provide any benefits from compression at all when the cube map is used as a texture in the later rendering pass.