One of the memory-bandwidth-consuming tasks in a computer graphics system is updating buffers, and in particular the color buffer. The color buffer contains the data that is finally going to be displayed, i.e. the output pixels. In a traditional architecture the color buffer is updated triangle by triangle. First the first triangle is rasterized, and the corresponding pixels of the color buffer are updated. Then the second triangle is rasterized, writing to its pixels in the color buffer, potentially overlapping with those of the first triangle. This means that each pixel in the color buffer can be written to several times. A typical application will overwrite every pixel in the color buffer perhaps three to ten times on average. This is known as having an overdraw of 3 to 10 within the technical field. This means that the write bandwidth for the color buffer will be between three and ten times as large as if each pixel was only written once.
One way to decrease the bandwidth requirements of the color buffer is to use what is known as a tiled architecture, Instead of rasterizing the scene triangle by triangle, the color buffer is divided into non-overlapping tiles. Then the scene is rendered tile by tile. For the first tile, only the triangles overlapping with the first tile are used for rasterization. The tile size is made small enough so that the entire tile can fit on-chip on the graphical processing unit (GPU). Hence no external memory accesses to the color buffer are needed during the rendering of the triangles of the tile. When all the triangles in the tile have been rasterized, the tile is sent to the color buffer memory and the next tile is processed. Note that in a tiled architecture, each pixel in the color buffer is only written once to external memory. This means that a tiled architecture can often decrease the write bandwidth for the color buffer with a factor of 3-10. A limitation with the tiled architecture is that, although each pixel is only written once, this is still quite expensive. Furthermore, the display controller will have to read each pixel to output it to the display. This means that each pixel will have to be read from the color buffer and written to the color buffer at least once in uncompressed form, which is expensive.
Another technique to lower color buffer bandwidth is called buffer compression. This means that blocks of buffer data are stored in memory in compressed form. The scene is still processed triangle-by-triangle, but before sending a block of pixels to the color buffer memory, the pixels are compressed by variable length encoding. A few bits called size bits stored or cached in the GPU are used to keep track of how well the data was compressed, for instance compressed down to a bit length corresponding to 25%, 50% or 75% of the original bit length or not compressed at all. When a subsequent triangle wants to write to the same block of pixels, the size bits are used to know how much data should be read. The block is decompressed, the new triangle overwrites some of the pixels in the block, and the block is then again compressed and stored. Hasselgren and Akenine-Möller, 2006, Efficient Depth Buffer Compression, In Graphics Hardware, 103-110 and Rasmusson, Hasselgren and Akenine-Möller, 2007, Exact and Error-bounded Approximate Color Buffer Compression and Decompression, In Graphics Hardware, 41-48 give good overviews of color buffer compression and depth buffer compression respectively.
In practical implementations, each pixel in the color buffer might be accessed several times in each frame. In such a case, color buffer compression needs to be carried out on millions of pixel blocks each second. Prior art color buffer compression schemes are sometimes too complex to achieve this high compression speed. There is therefore a need for more efficient pixel block compression that can be applied to color buffers and other pixel value buffers