Today, Ericsson Texture Compression (ETC 1) [1] is available on many devices. For instance, Android supports ETC1 from version 2.2 (Froyo), meaning that millions of devices are running ETC1.
ETC1 was originally developed to be an asymmetric codec; decompression had to be fast, but compression was supposed to be done off-line and could take longer. However, recent developments have made it important to be able to compress an image to ETC 1-format very quickly.
Another texture compression format is DXT1. However, platforms built for OpenGL ES may support ETC1 but not DXT1. It is therefore desired to be able to transcode DXT1 textures to ETC1. This way, after transcoding, rendering can be done from ETC1 instead, for which there is hardware support. However, the transcoding has to be fast enough for the user not to notice. It has been estimated that transcoding 20 Megapixels in less than 30 seconds is the upper limit. Ideally, it should be faster than that, perhaps 5 seconds.
To that end, we have created a test system where we can transcode DXT1 textures to ETC 1 textures quickly.
It should be noted that fast ETC1 encoding is beneficial under many circumstances, not only when transcoding from DXT1 data.
A problem is that current methods for transcode DXT1 textures to ETC1 are not fast enough. Also, image quality has been sacrificed in order to obtain faster encoding.
As an example, the software package “etcpack” that Ericsson provides to Khronos users has three modes; “fast”, “medium” and “slow”. Even the “fast” mode takes around 640 seconds to encode 20 Megapixel of RGB8 data on a mobile device (exemplified by a Sony Ericsson Xperia X10 mini). This is more than 20 times the stipulated 30 seconds. A more stripped-down version of the same code called “average”, takes about 264 seconds, but this is still eight times slower than necessary. We have recently hand-optimized the software so that it runs in 20 seconds on an X10 mini. However, further reducing the compression time is desirable.
One of the things that take the most times in the encoder is to select which modifying table of the ETC1 codec each half-block should use. To understand how this works we need to describe the ETC1 codec a bit more:
ETC1 compresses 4×4 blocks by treating them as two halfblocks. Each halfblock gets a “base color”, and then the luminance (intensity) can be modified in the halfblock. This is illustrated in FIG. 1.
The halfblocks within the 4×4 halfblocks are either lying or standing. Only one base color per halfblock is used. Per-pixel luminance is added to the base color and the resulting image is shown in FIG. 1.
The luminance information is added in the following way: First one out of eight modifying tables is selected. Possible tables are:
Table 0: {−8, −2, 2, 8}
Table 1: {−17, −5, 5, 17}
Table 2: {−29, −9, 9, 29}
Table 3: {−42, −13, 13, 42}
Table 4: {−60, −18, 18, 60}
Table 5: {−80, −24, 24, 80}
Table 6: {406, −33, 33, 106}
Table 7: {483, −47, 47, 183}
The selected table number is stored in the block using a 3-bit index. Each pixel also has a two-bit ‘pixel index’ making it possible to select one of the four items in the table.
Assume for instance that the base color is (R, G, B)=(173, 200, 100) and that we have selected table 4. Assume a pixel has a pixel index of 11 binary, i.e., the last item (60) in the table should be selected. This value is then added to all the three channels (red, green and blue). By using the same value for all channels, the ETC1 format can avoid spending bits on three different values for the three color channels. The color of the pixel is thus calculated as(173,200,100)+(60,60,60)=(233,260,160),which is then clamped to the range [0, 255] to the color (233, 255, 160).
Note that this can be written as(173,200,100)+60(1,1,1)=(233,260,160),which can be interpreted as follows: The end result (233, 260, 160) must lie on a line which goes through the base color (170, 200, 100) and which has the direction (1,1,1). Of course the base color can vary, and so can the distance 60 along the line, but the direction (1,1,1) never changes, and this is due to the fact that we always add the same number to all the three color components. In general, a certain base color b=(br, bg, bb) will mean that we can only reach colors on the line L:L:(br,bg,bb)+t(1,1,1)
The final value is clamped so that all channels have values between 0 and 255. This means that the final color may not lie on the line L above. Indeed in the example above, (233, 260, 160) lies on the line, but after clamping to (233, 255, 160), it does not. However, the line often provides a good approximation of where the final color may end up.
The hard part in the compression is to find out which table is best for this halfblock. The current way to do this is to try all eight tables, calculate the error for all the pixels for each table, and see which table generates the smallest error.
As can be seen by the pseudo-code below this makes for quite a deep inner loop:
for all blocksfor both half blocksfor all tablesfor all pixels in half blockfor all modifier values in tablefind modifier value for pixel
This is what takes most of the time during compression.