The demand for processing three-dimensional graphics on mobile communication devices is increasing. Real-time rendering of three-dimensional graphics has a number of appealing applications, such as video games, man-machine interfaces, messaging and m-commerce. Users of communication devices prefer to visualize the objects that they interact with and thus, three-dimensional graphics becomes a feature desired for such communication devices. However, three-dimensional rendering is a computationally expensive task because it requires advanced processing and extensive use of memory. Thus, dedicated hardware is often necessary to reach sufficient performance for rendering the three-dimensional graphics with desired speed and quality. Communication devices, specifically mobile communication devices such as mobile phones, personal digital assistants, video and still cameras are known to be limited in terms of hardware capabilities, i.e., processor power and memory space. In addition, there is a trend to miniaturize the communication devices, which further tolls the advanced processing and memory requirements necessary for three-dimensional rendering. One of the main bottlenecks for these communication devices, especially for mobile phones and game consoles, is memory bandwidth.
A technique for reducing memory bandwidth usage is depth buffer compression. Depth buffer compression is related to the management of image depth coordinates in three-dimensional (3-D) graphics, which is performed in hardware, software or a combination thereof. The depth buffer is one of the solutions to the visibility problem, which is the problem of deciding which elements of a rendered scene are visible, and which are hidden. Depth buffering is also known as Z-buffering. When an object is rendered, the depth of a generated pixel (z coordinate) is stored in a buffer (the depth buffer). This buffer is usually arranged as a two-dimensional array (x-y) with one element for each screen pixel. If another object of the scene is later rendered in the same pixel, the depth of the new object is compared to the one stored in the z-buffer. The new object is only rendered to the pixel, and its depth value is only written to the z-buffer if the new object is closer, i.e., if its depth in the pixel is smaller than the stored value. In the end, the depth buffer allows correct reproduction of usual depth perception: a close object hides a farther one. The depth buffer may be stored on-chip or at an off-chip location, for example an external memory.
In more detail, the rendering of a 3D image is based on primitives, for example triangles, which are drawn in a non-sorted order. There is a need to prevent triangles further back from being drawn on top of triangles in front of them that have been drawn earlier, and this is why the depth buffer is introduced. The depth buffer holds, for each pixel, the depth (distance to the eye) for that particular pixel. Before writing a new pixel, the corresponding depth is first read from the depth buffer. The new pixel is only written to the image if the new depth is smaller than the previously written depth stored in the depth buffer. If the image was updated, the new depth value is then written to the depth buffer. The reading and writing of depth values generate numerous memory accesses. Since the depth buffer is often too big to fit on-chip, these memory accesses will be external (off-chip) accesses. Such memory accesses are often slow, which means that depth buffer accesses can slow down the performance of the device overall. Off-chip memory accesses are also costly in terms of energy consumption, which means that depth buffer accesses also has a potential of draining the battery of hand-held devices.
Most graphics systems, irrespectively if they use depth buffer compression or not, divide the depth buffer into tiles. Rasterization of a primitive is then often performed on a per-tile basis, meaning that all pixels in one tile are processed before processing any pixel in another tile. This means that all depth values for a certain tile can be read in at once using a fast “burst-type” memory access, a fast memory access or other access as would be appreciated by one skilled in the art. During processing of the tile, its depth values are typically stored in an on-chip cache, where they can be accessed without the penalty of an off-chip access. After the tile has been processed, the depth values can be written back from the cache to the external memory, again using a fast burst memory access. However, even though memory accesses are now faster due to bursting, they may still be too slow to reach sufficient performance. Also note that tiling and bursting does not decrease the amount of data that is transmitted over the bus between the GPU and the external memory.
Depth buffer compression addresses this problem by compressing the data in the depth buffer tile before it is written to the external memory. The information is then sent from the chip over the bus, in compressed form, and is also stored in compressed form in the external memory. When the depth values in a tile are needed again at a later stage, the compressed version of the block is read from the external memory and sent over the bus to the chip for processing. After this, the data may be decompressed before being stored in the cache in uncompressed form, and the processing can then take place in the chip. After processing this data, the block is yet again compressed in the chip before being sent from the cache to the external memory. Because this cycle of decompression and compression may happen several times for a particular tile, it is desired that the compression is lossless, i.e., non-destructive. The downside to this is that it is impossible to guarantee any compression at all for losslessly stored data. Thus, it is necessary to also have the capability to store the tile in a non-compressed form.
Another desired feature when rendering three-dimensional graphics is random access to needed tiles. One way to solve this is to reserve a large enough number of bits for each tile so that the tile can be stored non-compressed. In this situation, if a particular tile can be compressed to 50% of the original size, only half of the reserved storage is used for that particular tile. A flag stored on-chip may be used to indicate whether the tile is compressed or non-compressed in the depth buffer. Alternatively, this flag may be stored in the external memory but cached on-chip.
The features and techniques described above are equivalent to data expansion and not compression because the same number of bits is reserved for the compressed mode as for the non-compressed mode and, in addition, more bits are necessary for the flags. However, when transporting a tile between the external memory and the on-chip cache the smaller, compressed, size can be used. Thus, although memory storage is not reduced, memory bandwidth usage is.
In the past, many methods have been used for depth buffer compression, including plane encoding and depth offset encoding. The first technique assumes that a tile is completely covered by one or two triangles. Since depth values emanating from the same triangle will be co-planar, such depth values can efficiently be encoded as planes. For tiles where this works, high compression is possible, but not all tiles can be compressed in this manner. Another method is depth offset compression, which encodes one or more base values, and stores per-pixel offsets from these values. The compression ratios achieved by this method are not as good as those of the plane encoding, but more tiles can be compressed. These two methods are described in more detail in J. Hasselgren and T. Akenine-Möller, “Efficient Depth Buffer Compression,” In Proc. Graphics Hardware (2006), pp 103-110, the entire content of which is incorporated herein by reference.
These compression schemes work for tiles of 4×4 depth values, especially when the used architecture allows for both plane encoding and depth offset to be combined, resulting thus in some compression for most tiles and high compression for some tiles. The problem with this approach is that for larger tile sizes, such as 8×8 depth values, the average compression ratio decreases. This decrease is despite the fact that a larger tile size should allow for better utilization of spatial redundancy, and indeed, the 8×8 plane encoding methods achieve better compression ratios for the blocks which they can compress. A reason for the decrease in performance is the simple plane assumptions made: 8×8 tiles are seldom covered by a single triangle. More complex methods suitable for multiple triangles may be too costly to implement. In other words, an architecture designed to efficiently compress 4×4 tiles is not as efficient when compressing 8×8 tiles. Additionally, the tile size affects far more than just depth buffer compression, and so it is not simply a matter of selecting the tile size which leads to the best compression ratios.
Accordingly, it would be desirable to provide devices, systems and methods for controlling a compression of the depth buffer that avoid the afore-described problems and drawbacks.