1. Field of the Invention
The invention is directed to the encoding and decoding of a video data stream that contains digitalized images.
2. Description of the Prior Art
The encoding of video signals according to the image encoding standard MPEG (MPEG1, MPEG2) [1], JPEG [2], H.261[3], H.263[4] is based on the principle of what is referred to as block-based image encoding.
The block-based image encoding methods employ principles of prediction encoding and of transformation encoding.
In the prediction, difference images are generated by subtraction of predicted image data from the original image data to be encoded.
What is referred to as motion-compensated prediction is employed for the prediction. The fundamentals of the motion estimation required for this purpose and their application for motion-compensated prediction are known to those skilled in the art, such as is disclosed in the article “Motion—Compensated Television Coding: Part I” by Netravali and Roberts. The motion estimation ensues such for an image block to be encoded includes comparing luminance information (brightness information) that is respectively allocated to a picture element of the image of the image block to be encoded to stored luminance information of an area having the same shape in a chronologically preceding image. The comparison usually continues by obtaining the absolute difference of the individual luminance values. The comparison for the image block to be encoded is applied to a plurality of regions of the preceding image that are referred to below as preceding image blocks. The difference images contain the difference between the luminance values of the image block and the luminance values of the preceding image block that coincides “best” in the motion estimation.
The topical correlations between neighboring picture elements present in the difference images are utilized with the assistance of a suitable transformation, such as a discrete cosine transformation (DCT—for example). The transformation encoding that is employed supplies transformation encoding coefficients that are subjected to a quantization and to an entropy encoding.
Subsequently, the transformation encoding coefficients are transmitted to a receiver, wherein the entire encoding method is implemented in an inverse manner. As a result, direct information about the picture elements is, in turn, available at the receiver after implementation of the decoding.
A distinction is made between two different image encoding modes in block-based image encoding methods.
In what is referred to as the intra-image encoding mode, the entire image or a suitable sub-portion of the image (for example, an image block) is respectively encoded with the entire encoding information allocated to the picture elements of the image and is transmitted. What are referred to as I-images or I-image blocks are encoded in this mode.
In what is referred to as the inter-image encoding mode, only the respective difference image information of two chronologically successive images is encoded and transmitted. In this mode, what are referred to as P-images or B-images or P-image blocks or B-image blocks are encoded.
What is to be understood by encoding information below is brightness information (luminance information or color information) (chrominance information) that is allocated to the picture elements of the image.
Methods for what is referred to as object-based image encoding are known from ISO/IEC JTC1/SC29/WG11, MPEG-4 Video Verification Model published by the International Organization for Standardization (ISO). In object-based image encoding, a segmentation of an image is performed according to the image objects occurring in the image. The image objects are separately encoded. Methods for motion estimation and transformation encoding are likewise utilized given this method.
Given object-based image encoding methods, each image object BO is first resolved into image blocks BB having a fixed size, for example 8×8 picture elements BP. After the resolution, some of the resulting image blocks are completely located within an image object BO, as shown in FIG. 4. The image B contains at least one image object BO that is bounded with an object edge OA of the image object BO.
Image blocks BB that contain at least a part of the object edge OK are referred to below as edge image blocks RBB.
Image blocks BB that are located completely within an image object BO after the resolution can—based on the aforementioned block-based image encoding methods—be transformation-encoded with a standard, block-based, discrete cosine transformation (DCT).
The edge image blocks RBB, however, must be encoded with a separate method.
Previously, there have been two fundamental approaches for encoding the edge image blocks RBB.
The ISO publication discloses that the encoding information of the picture elements of the image object BO within an edge image block RBB be supplemented by a suitable extrapolation method of the encoding information onto the area of the complete edge image block RBB. This procedure is referred to as padding. The supplemented area is subsequently encoded with a standard, two-dimensional, discrete cosine transformation.
Alternatively, it is known from The ISO publication and the article “Shape Adaptive DCT for Generic Coding of Video” by Sidora and Makai that the image object BO is transformed separately according to lines and columns. This technique is referred to as shape-adapted transformation encoding, as shape-adapted DCT when a DCT is employed (Shape Adaptive DCT, SA-DCT). The DCT coefficients allocated to the image object BO are determined such that the picture elements BP of an edge image block RBB that do not belong to the image object BO are masked out. A one dimensional DCT whose length corresponds to the number of remaining picture elements BP in the respective column is then initially applied column-by-column onto the remaining picture elements BP. The resulting DCT coefficients are horizontally aligned and are subsequently subjected to a further one-dimensional DCT in horizontal direction with a corresponding length.
The rule of SA-DCT known from the teachings of Sikora and Mikai is based on a transformation matrix D T-N having the following structure:
                                                        DCT              -              N                        _                    ⁢                      (                          p              ,              k                        )                          =                              Υ            ·            cos                    ⁢                                    ⌊                              p                ·                                  (                                      k                    +                                          1                      2                                                        )                                            ]                        ·                          π              N                                          ⌋        ⁢          k      ·      p        =      0    →          N      -      1      
The value
  Υ  =      1          2      applies to the case p=0 and γ=1 applies to all other cases.
N refers to a size of the image vector to be transformed wherein the transformed picture elements are contained.
DCT-N refers to a transformation matrix having the size N×N.
Indices are referenced p, k, with p, k ε [0, N−I].
According to SA-DCT, each column of the image block to be transformed is vertically transformed according to the rule
            c      _        =          2      ·              2        N            ·                        DCT          -          N                _              ,            x      _        j  and the same rule is subsequently applied onto the resulting data in horizontal direction.
One disadvantage of SA-DCT is that none of the resulting transformation coefficients (spectral coefficients) represents the constant part of the encoding information of the picture elements BP of the image object BO. The constant component, which is also referred to as the DC coefficient, however, already contains the majority part of the signal energy given ordinary image data and is therefore, of particular significance for an efficient image encoding.