The real-time rendering of three-dimensional graphics has a number of appealing applications on mobile terminals, including games, man-machine interfaces, messaging and m-commerce. Since three-dimensional rendering is a computationally expensive task, dedicated hardware must often be built to reach sufficient performance. Innovative ways of lowering the complexity and bandwidth usage of this hardware architecture are thus of great importance.
The main bottleneck, especially for mobile phones, is memory bandwidth. A common technique for reducing memory bandwidth usage is texture compression. Texturing refers to the process of “gluing” images (here called textures) onto the rendered triangles. If the textures are compressed in memory, and then during accessing they are decompressed, a significant amount of bandwidth usage can be avoided.
Most texture compression schemes are concentrating on image-type data, such as photographs. However, with the advent of programmable shaders, textures have started to be used for many other types of data than just traditional photographic images. Bump mapping has therefore become a widespread technique which adds the illusion of detail to geometrical objects in an inexpensive way. More specifically, a texture, called a bump map or normal map, is used at each pixel to perturb the surface normal. A common approach to generate normal maps is to start with a high polygon count model and create a low complexity model using some geometrical simplification algorithm. The “difference” between these two models is then “baked” into a normal map. For real-time rendering, the normal map is applied to the low complexity model, giving it a more detailed appearance. For instance, the document [1] shows how it is possible to go from a very high triangle-count model (15 000 polygons) to a very low one (1 000 polygons) with preserved quality by using normal maps.
To be able to use lower polygon-count models is of course very attractive for mobile devices and terminals, since they have lower computational performance than PC systems.
However, one problem is that the texture compression methods available are created with photographic images in mind and do not work well when the data is something else, such as a normal map. For example, S3TC (same as DXTC) [2] has been employed for compressing normal maps, however, with block artifacts as a result, see document [1].
In the majority of cases today, bump mapping is performed in local tangent space (X, Y, Z), of each rendering primitive, e.g. a triangle. Since the length of the normal is not of interest, unit normals can be employed. Thus, the problem is to compress triplets (X, Y, Z), where X2+Y2+Z2=1. The simplest scheme, is just to treat X, Y, Z as RGB (Red, Green, Blue) and compress it with S3TC/DXT1, but that gives rather bad quality.
Another way is to only compress X and Y, and then compute Z using equation 1:Z=√{square root over (1−X2−Y2)}  (1)
By concentrating on X and Y it is possible to get a lower distortion. In order to enhance quality further, DXT5 can be used. Normally DXT5 is a version of S3TC used for alpha textures, i.e., RGBA textures, where RGB are coded together and the alpha component (A) is coded independently. Thus one approach has been to use the alpha channel to code X and the G channel to code Y. R and B have been unused in order to give maximum quality to G. However, that still does not give enough quality.
Actually, for smooth surfaces it turns out that even uncompressed RGB888/XYZ888 does not give enough quality for some objects. Especially for smooth surfaces, more than eight bits are needed. Therefore ATI Technologies developed 3Dc [1], which is a compression format that will often allow higher quality than XYZ888.
Just as in DXT5, only X and Y are compressed, and Z is calculated. X and Y are compressed separately. The X-values are grouped into blocks of 4×4 pixels. These values can range from −127.000 to +127.000, (or alternatively, from 0 to 255), but they are often clustered in an interval. 3Dc takes advantage of this and specifies this value using 16 bits: eight bits for the start of the interval and eight bits for the end of the interval.
Inside this interval, each value is specified using 3 bits each. This means that eight reconstruction levels within the interval are possible. The reconstruction levels are always equispaced (evenly spaced), reflecting an assumption that the distribution inside the interval is often close to uniform.
In total, 16 bits are used to specify the interval, and 3×16=48 bits are spent on specifying the reconstruction levels for the individual pixels. Thus, a 4×4 block of X-data is compressed to 64 bits. The same coding is valid for Y, and in total thus 128 bits are used, or 8 bits per pixel.
While 3Dc produces much better quality than DXT1 or DXT5, it may still not be enough. The interval is specified with 8 bits, and the smallest possible interval is when there is a difference of only one between start and end of the intervals, such as, for instance, the interval [77, 78] or [−3, −4]. In this case, there are eight reconstruction levels in between, meaning another three bits. Thus, the theoretically maximal resolution is 8+3=11 bits per component. This may not be enough for slowly varying surfaces.