Applications where pictures or picture streams have to be encoded efficiently are numerous. For example, still image compression is normally done by digital photo cameras to increase the number of pictures that can be stored on a storage medium of a given size. When it comes to transmission of image sequences or complete movies over a transmission medium offering only limited bandwidth, the use of an efficient codec (coder-decoder) that allows for a high compression of the content of the pictures becomes even more urgent. This is on the one hand due to the desired transmission over transport channels offering low bandwidth, such as the streaming of video content to mobile phones. On the other hand, the transmission of high-resolution video content is becoming more and more popular since displays capable of displaying such high resolution pictures are spreading more and more amongst consumers. One major trend is the upcoming broadcast of high-definition television (HDTV).
In general, two different coding approaches may be distinguished, the first aiming for an encoding without any loss of information and the second accepting a (moderate) loss of information and quality to achieve a significant reduction in file sizes. Although lossless encoding techniques exist for both still images and movie content, these techniques, often based on entropy coding, cannot achieve a file-size reduction being sufficient or acceptable for the desired application. Therefore, lossy compression is mostly advantageous such as JPEG for still image compression and MPEG 2 for movie compression.
Generally, lossy compression has the problem of a decreased quality of the compressed pictures compared to the underlying original picture. Naturally, the quality of the picture becomes worse when the compression rate is increased, i.e. when the file size of the compressed picture is decreased. Therefore, one has to find a compromise between the desired quality of a compressed image and the file size acceptable for transmission or storage. Mostly, the decrease in file size and also the loss in information is achieved by quantization of parameters describing the picture properties and hence, the coarser the quantization the worse the quality and the smaller the compressed picture. The quality of the compressed picture is commonly estimated by a comparison of the compressed picture with the underlying original picture. This allows estimating a signal-to-noise ratio, wherein the noise is understood to be the noise introduced during the compression.
In current compression algorithms, a block-wise processing of images is widely used. The underlying basic idea is that for normal image content, a change of content, e.g. of color and brightness, of neighboring pixels is normally relatively small. Therefore, by using areas of neighboring pixels that are processed and compressed together, one should achieve rather high compression rates without significantly reducing the perceptual quality of the picture. Such a picture block is from here on also referred to as macro-block. Thus, in other words, the macro-blocks serve as a kind of sub-picture unit in coding. The block-subdivision is illustrated in FIG. 7, where a picture 10 is subdivided into 12 equally sized picture blocks 12A to 12L. The subdivision into 12 different picture blocks is to be understood as an example only.
As an example, a single picture block 12 I is magnified in FIG. 7, wherein the subdivision of the picture block 12 I into an 8×8 matrix shows the single pixel elements building the macro-block 12I. Also here, the formation of a picture block from 8×8 individual pixels is to be understood as an example only. To represent color within each individual pixel, each pixel is assigned three parameters holding different color information in a certain color space.
One simple approach of encoding a macro-block is to quantize the three parameters of each single pixel and to perform an entropy coding on the quantized parameters after the quantization. Since quantization significantly reduces the available parameter space for the entropy coding, quantization of the parameters can already reduce the amount of storage space or bits needed to describe one macro-block significantly.
However, in order to reduce the amount of syntax elements describing the picture content having high energy, the picture information within one macro-block is often described by transformation coefficients, generated by transforming the picture content within the macro-blocks into another representation (spectral domain). One example is to perform a discrete cosine transformation, eventually on a sub-macro-block level, and to use the transformation coefficients as the image information, which may then be quantized and which might also be entropy coded after quantization.
The transformation may, for example, be applied to the complete pixel information, i.e. three parameter values per pixel of the picture block 12I. Advantageously, the transformation is performed separately for the three parameters/components.
For further reduction of file sizes and higher compression, one may also make use of a property of the human eye, which seems to put more weight on brightness information than on color information when judging the perceptual quality of an encoded picture. Therefore, one possibility to enhance the coding performance (with respect to quality and bit rate) is to reduce the number of color parameters with respect to the number of brightness parameters within a macro-block. That is, the information basis, on which a representation based of transformation coefficients is based, contains more information on brightness within the picture block than on color. Since there are numerous ways to describe a color by one single brightness-value and two color-values, the brightness-value shall be referred to as luma-value and the color-values shall be referred to as chroma-values from here on.
One possible way of building such a picture block 12I, suited to be transformed, is indicated in FIG. 7. The magnified picture block 12I has 8×8 individual pixels, each pixel normally described by one luma and two chroma values. FIG. 12I exemplifies a way to reduce the amount of chroma-information in that only the chroma information of specific pixels is used as the data set underlying the transformation. This is indicated by the letter C within each individual pixel that is part of the chroma-data set. On the contrary, the most important luma information of every individual pixel is used.
It is to be understood that the situation shown in the magnified macro-block 12I is an example only. It is also possible to further reduce the amount of chroma information. This could, for example, be achieved by omitting every second chroma information, that is for every eight luma values, one chroma value would be taken into account during the transformation. It would also be possible to not simply use the chroma-values of the pixels shown in FIG. 12A but to calculate an average chroma value from four neighboring pixels by averaging the chroma value of the pixels. Such a chroma value would then be assigned to a position within the macro-block that is lying in the center of the four underlying pixels, as indicated by chroma value 16 indicated in FIG. 7.
The encoding techniques described above can generally be used for both still images and moving pictures. For moving pictures, more sophisticated methods of encoding are used, involving motion estimation.
In case of macro-block-wise motion estimation, two (or more) pictures of a picture stream (the pictures do not necessarily have to directly follow each other) are located which show the same picture content in the two images. In the simplest case, the picture content within the macro-block of a current frame has not changed compared to the reference frame. However, the content of the macro-block may appear at a slightly different position in the reference frame. In this case it is sufficient to know the motion vector of the movement of the picture content during the transition from the reference picture to the macro-block of the current picture to reconstruct or predict the picture information of the picture block in the current picture, once the reference picture is completely known at the decoder side. Of course, normally there are slight changes within the picture block during the transition from the reference picture to the current picture. Due to this, the prediction error is also transmitted thereby allowing to reconstruct the change of picture content in the macro-block along with the motion vector, to allow for a complete reconstruction of the macro-block in the current picture. Codecs which use motion prediction with subsequent residual coding such as transformation and entropy coding are called hybrid video codecs.
According to state of the art techniques, predictive coding allows for an efficient representation of picture sequences. In predictive coding, first a value for a quantity to be coded is predicted and then only the difference of the really observed value to the predicted value is coded and transmitted. This will also yield a gain in bit rate, since having a reliable prediction, the difference parameters will on the average be smaller than the absolute parameters describing the picture within the macro-block. Hence, the symbol space on which a subsequent entropy coding (with or without preceding quantization) is based can be decreased, allowing for shorter code words and such for a reduction in bit-rate.
Although there have been quite some efforts undertaken to decrease the file size of compressed pictures or movies which are compressed using block-wise coding strategies without unacceptably decreasing the perceptual quality of the compressed content, the properties of the single picture blocks are still not exploited optimally with respect to different parametric representations of picture blocks.