1. Field of the Invention
The present invention relates to techniques for compression of digital video signals.
These techniques, which aim at representing an image with the smallest number of bits possible, at the same time preserving the level of quality and intelligibility required for a given application, pursue reduction of the channel band required by image-transmission systems (digital television, video conferences, etc.) and, in a symmetrical and parallel way, reduction of the storage space necessary for storing the images and sequences on optical or magnetic media, such as CDs, DVDs, etc.
2. Description of the Related Art
In 1988 the Moving Picture Experts Group (MPEG) started its activity as a work group in the ISO/IEC framework with the aim of defining a standard for digital audio and video compression. In 1993 the MPEG2 standard was defined which is able to handle interlaced images for the encoding of television signals with the aim of supplying a wide range of bitrates (2-20 Mbps) and of variable resolutions. In particular, the standard television (SDTV) signal can be encoded with a bitrate of between 4 and 9 Mbps, whilst the high-definition television (HTDV) signal can be encoded with a bitrate of from between 15 and 25 Mbps.
The biggest difficulty encountered in defining the MPEG2 standard derives from the need to obtain a very high level of compression, combined with the need to guarantee the possibility of random access to the compressed video signal. The first objective may be achieved only by using an “interframe” type of encoding, i.e., one that is based on the information contained in more than one frame at a time, whilst the second requires an “intraframe” type of encoding, i.e., one based only on the information contained in a given frame. The efficiency of the MPEG2 standard consequently depends upon the compromise that is reached between the two types of encoding. To obtain a high compression ratio it is of fundamental importance to reduce the redundancy in the signal and to discard the information that is not important for the user. The MPEG2 compression algorithm is therefore a lossy encoding algorithm, i.e., one with loss of information.
The aforesaid algorithm basically comprises three processing components:
motion-compensation, for reducing the temporal redundancy of the signal, which is based upon recognition of the existence of image portions that remain the same in successive frames;
conversion, which aims at reducing the spatial redundancy of the signal and is based upon the recognition of image portions that remain the same within a frame; and
entropic encoding for reducing statistical redundancy.
In order to manage the different levels of redundancy present in the video signal, three types of frames are defined: intra (I), prediction (P), and interpolation or bidirectional (B).
All three types of images (I, P and B) still contain a high level of spatial redundancy. In order to reduce it, the macroblocks that define the images are further subdivided into blocks of 8×8 pixels each, which are converted into values of spatial frequency by means of the discrete cosine transform (DCT). The DCT presents the advantage of de-correlating the coefficients of a block and concentrating most of the information in the low-frequency region, but in effect it does not reduce the amount of data necessary for representing an image.
Most of the bitrate reduction is consequently obtained by means of quantization, which, in the MPEG2 standard is a uniform scalar quantization: each coefficient belonging to a macroblock is divided by a given number using the rate-control algorithm. Given the characteristics of the DCT, in the regions corresponding to the highest spatial frequencies there is thus present a large number of zero coefficients, which are encoded efficiently using the run-length coding (RLC) technique. To reduce the residual statistical redundancy of the quantized signal, together with the RLC technique, Huffman entropic coding is used, which assigns the shortest codewords to the most frequent combinations (variable-length coding—VLC). As understood, quantization yields a good degree of compression but with a slight loss of quality; in fact, it is the only irreversible operation of the entire encoding algorithm. In the MPEG2 standard a scalar quantization is used that is based upon threshold encoding. This is an adaptive method, where in each block those coefficients are preserved that exceed a given threshold. In type-I blocks, the alternating-current (AC) and direct-current (DC) components are quantized separately. As regards macroblocks belonging to type-P or type-B frames, instead, the quantizer is the same for both types of component.
The quantization matrices are used to increase the quantization pitch of the coefficients that are least significant in the block. As has already been said, the DCT has the merit of concentrating most of the information in a small number of low-frequency coefficients. In type-I blocks, the information contained in the high-frequency coefficients is generally negligible and above all not very visible to the human eye. For this reason, the attempt is made to quantize the high-frequency coefficients in a coarser way, favoring the low-frequency ones. In type-P and type-B blocks, the coefficients represent the prediction error with respect to the reference blocks pointed by the motion vector. Their energy content is no longer strongly localized as in the case of type-I blocks, and the use of quantization matrices in this case is less important. For this reason, in the MPEG2 standard, the quantization matrix used by default for the blocks other than the intra blocks is constituted by coefficients that are all equal. Of course, the quantization matrices, for all types of frames, can be specified from outside and can be different for each image of the sequence. When this happens, the said matrices must be encoded and introduced into the MPEG bitstream.