When attempting to increase performance for graphics processing units (GPUs), one solution is to apply various techniques to reduce memory bandwidth consumption. Bandwidth reduction is also becoming increasingly important as the performance growth rate for processing power is much larger than performance growth rate for bandwidth and latency for random access memory (RAM).
Texture compression is one popular way of reducing bandwidth requirements. By storing textures in compressed form in memory and transferring blocks of the compressed data over the bus, the texture bandwidth is reduced substantially.
Today, the most used texture compression systems are DXTC [1] for Windows based systems and Xbox, and ETC [2] for mobile handsets. Both these systems divide an image, denoted texture, into pixel blocks of 4×4 pixels and the red, green, blue (RGB) data of the pixels is then compressed from (8+8+8)×16=384 bits down to 64 bits. Thus, each pixel block is given the same number of bits. This is important since the rasterizer used in the decompression may need to access any part of the texture and needs to easily be able to calculate the memory address of the desired pixel block. In other words, a fixed rate codec, i.e. a codec where every pixel block takes up the same amount of storage space, is very desirable and is the norm among texture compression algorithms today.
However, the fixed compression rate comes at a great cost. In most textures, some parts thereof usually contain a lot more information than other parts of the texture. If a fixed rate must be used for every pixel block, either too many bits will be spent in the easy areas, which therefore will be of unnecessary good quality, or too few bits will be spent in the hard-to-code areas, giving unsatisfactory quality to those parts.
The solution to this problem of fixed rate codecs is to use so-called variable length coding (VLC). In VLC easy-to-code pixel blocks are given few bits and hard-to-code pixel blocks are given more bits. However, this improvement in pixel block compression comes at the cost of not achieving an easy calculation of the memory address of a particular pixel block.
Inada and McCool [3] have presented a solution that uses a look-up-table (LUT) for indicating where each VLC-coded pixel block is located in the memory. Usage of LUTs, though, introduces new problems in the form of caching. Before the decoding system can start requesting the desired pixel block from memory it first must use a LUT in this case. However, the LUT is rather big and cannot be kept on chip. Hence, the decoding system must first request the relevant part of the LUT and not until it receives this information from the memory does it know from where to request the relevant bits of the pixel block. This is called memory indirection and is a major complication. It adds to latency, which is one of the major problems of modern graphical processing units (GPUs).