Graphics Processing Units (GPUs) are microprocessors associated with graphics rendering. The GPU performs processing on signals from the computer and can write the results e.g. to the display. In contrast to a Central Processing Unit (CPU) which performs overall computations, the GPU typically performs only computations related to graphics rendering, although in recent years general purpose computing has started to be done on GPUs also. FIG. 1 illustrates schematically a mobile device 100 with a GPU 130. Compressed images called textures are stored in an image memory 110 and bits of theses compressed images 150 are transmitted from the texture memory 112 to the GPU 130 and then decoded by a decoder 140 of the GPU 130. The GPU then uses the bits of decompressed textures to produce bits of a rendered image 160. These bits of the rendered image are then sent back to a frame buffer memory 114 and then typically displayed at the display 120. An example could be a rendering of a room with a number of painting hanging on a wall. The textures would then be the pictures in the paintings, and the rendered image would be the final image depicting the room.
When attempting to increase performance for Graphics Processing Units (GPUs), one important method is applying various techniques to reduce memory bandwidth consumption, i.e. the bandwidth required between the memory and the GPU. The importance of bandwidth reduction is also becoming increasingly important as the performance growth rate for processing power is much larger than performance growth for bandwidth and latency for Random Access Memories (RAMs).
Although it is sometimes possible to trade computations for memory accesses (for example by computing the value of functions rather than accessing pre-computed lookup-tables), it is likely that at some point the computation needs are satisfied, leaving the GPU idly waiting for memory access requests. Additionally, a brute force approach of simply duplicating memory banks and increasing the number of pins on memory chips may not be feasible in the long run. Finally, transferring data between the GPU and the RAM consumes large amounts of power, which is a problem, especially in mobile applications. Because of that, memory bandwidth reduction algorithms are an important area of future research.
One type of images used in graphics applications is referred to as textures. A texture is just a regular image that is used to represent the surface of a graphics primitive such as a triangle, or a quadrilateral (quad). In the example above, each painting would be drawn using one quad (or two triangles). Each quad would then be “painted” with a texture depicting the artwork (such as Mona Lisa). In this case, the texture would be an image depicting Mona Lisa. Since a texture is a type of image, it consists of pixels. However, since the final rendered image also consists of pixels, it is common to use the name “texel” for an image element in the texture, and save “pixel” for the image element of the final rendered image. This terminology is assumed in the following. In order to draw a pixel in the rendered image, one must first work out where in the texture this corresponds to. For instance, if the GPU is drawing the middle part of the Mona Lisa painting, the pixel should assume the color from the corresponding texel in the middle part of the Mona Lisa texture. Often however, the pixel will not correspond exactly to a texel in the texture, but will fall somewhere in between four texels. Then, bilinear filtering is usually done between these four texels to produce the pixel. This is referred to in the literature as texture mapping using bilinear filtering. Another complication is that the resolution of the rendered painting may be different from the resolution of the texture. For instance, the painting in the rendered image may occupy 100×100 pixels whereas the texture may have the size of 512×512 texels. Rendering from such a big texture may produce antialiasing artifacts. Therefore, a preprocessing step is typically averaging and subsampling the texture to a set of resolutions, for instance 256×256, 128×128, 64×64, 32×32 16×16, 8×8, 4×4, 2×2 and 1×1 texels. These levels are often called mipmap levels, with the 512×512-version being the highest resolution mipmaps level and the 1×1-version being the lowest resolution mipmap level. The two closest mipmaps levels are then used. If it is again assumed that the painting in the rendered image occupies 100×100 pixels, the GPU will use the mipmap levels of 128×128 texels and 64×64 texels. In each of these, the GPU will calculate the nearest four texels, combine them bilinearly, and then combine the two resulting values linearly. This is referred to in the literature as trilinear mipmapping. As should be clear from the above, this means that up to eight texels may have to be processed in order to produce one rendered pixel.
Texture compression is one popular way of reducing bandwidth requirements. By storing textures in compressed form in memory and transferring blocks of this compressed data over the bus, texture bandwidth is substantially reduced. A light map is a common type of texture. A lightmap is a texture applied to an object to simulate the distance-related attenuation of a local light source. E.g., computer games use lightmaps to simulate effects of local light sources, both stationary and moving.
Traditionally, light maps have been used to model slowly varying lighting behavior in an economical way. A typical example has been a textured brick wall: If only one texture is used, the texture has to be of very high resolution in order to reproduce the details of individual bricks. To avoid big textures an obvious trick is to repeat the brick pattern, which means that the brick texture can be small and still of high resolution with respect to the screen resolution. The drawback is that the brick wall then has to be exactly the same—a brick in the top left part of the wall must be represented by the same texels as a brick in the lower right corner. The lighting will thus be uniform across the entire wall, which often looks unrealistic.
Light maps were created to get around this problem: Two textures were used; one small, repeated texture of high resolution, and one small, non-repeated of lower resolution. The final texel with which to color the rendered pixel can then be calculated using a formula such asfinal_color(x,y)=brick_texture(x+i*N,y+j*M)*lightmap_texture_(x/S,y/T)where the brick_texture is the repeated texture and lightmap_texture is a low-resolution texture. Here i is changed so that x+i*N is never bigger than N−1 and never smaller than 0 (sometimes by using negatively valued is). Likewise, j is changed so that y+j*M is between 0 and M−1. This way, both brick_texture and lightmap_texture could be small and thus require little bandwidth during rendering. The reason this works is due to the fact that the changes in lighting of the lightmaps usually are rather slow, and lowering the resolution is therefore OK.
Early lightmaps were scalar valued, i.e., they contained only an intensity value in each texel that decreased the intensity of the other texture. Soon came colored lightmaps, where each texel contained an RGB (Red-Green-Blue)-tuple, and so were able to simulate colored light.
Recent developments have increased the photo-realism of light maps by describing the incoming light in three different directions. Hence, instead of just storing a single RGB-tuple in the texel which describes the average (colored) lighting hits a particular point on the texture, three RGB-tuples are stored. Each RGB-tuple now describes the light that shines on a particular point from a particular direction. Together with a normal-map, which describes the normal in the particular point, the fragment shader can then calculate which of these three light directions are most relevant and compute the fragment (pixel) color accordingly. With an additional trick, even texture self-shadowing is possible.
Whereas these recent developments increase photorealism, they also demand three times the storage and bandwidth, since three RGB-tuples are stored instead of one. Moreover, many applications demand high dynamic range data (floats or halfs instead of integers), further increasing the burden of bandwidth and storage. Therefore, light maps are increasingly compressed using texture compression methods, such as DXT1. DXT1 is a texture compression method which converts a 4×4 block of pixels to 64-bits, resulting in a compression ratio of 6:1 with 24-bit RGB input data. DXT1 is a lossy compression algorithm, resulting in image quality degradation, an effect which is minimized by the ability to increase texture resolutions while maintaining the same memory requirements (or even lowering it).
While DXT1-compression usually gives quite good image quality for regular textures, there are a few special cases where DXT1 does not do a very good job. The most important one is perhaps slow transitions between two colors. Since DXT1 cannot have more than four different colors per 4×4 block, it is impossible to create very smooth ramps between colors. The result will be “grainy” or “dirty”-looking transitions. This is exemplified in FIGS. 2a and 2b. 
FIG. 2a shows the original texture and FIG. 2b shows the texture compressed with DXT1. Notice how the limitation to only four colors per 4×4 block results in a grainy ramp between gray and white in FIG. 2b. 
Unfortunately, since light maps depict smoothly changing light across surfaces, they often contain exactly the type of ramps that DXT1 is bad at handling. At the same time, since the textures are often smooth, the information content in the textures is not that high. Hence it should be possible to do better than DXT1.
ETC2 contains a “planar” mode that approximates each color component in each 4×4 block with a plane equation. This means that smooth ramps are possible within a 4×4 block, if the color variations within the block are roughly planar. As can be seen in FIG. 3, the quality of the ramp is greatly improved compared to DXT1. However, it is far from perfect. For example, note the edge between the ramp and the uniform dark blue area, where a planar approximation is a rather bad approximation.
FIG. 4a shows an original image, and FIG. 4b shows the same image compressed using DXT1. Notice how the image compressed with DXT1 is much more grainy than the original.