The invention relates to video picture compression techniques, and, more particularly, to an algorithm for optimizing performance of a motion estimation system and for optimizing a discrete cosine transform (DCT) of the video data in compression and coding phases.
The set of pixels of a field of a picture may be placed in a position of a subsequent picture obtained by translating the preceding one. These transpositions of objects may expose to the video camera parts that were not visible before as well as changes of their contours, such as during zooming, for example. The family of algorithms suitable to identify and associate these portions of images is generally referred to as motion estimation. Such an association permits calculation of the portion of a difference image by removing the redundant temporal information. This makes the subsequent process of compression by discrete cosine transform, quantization and entropy coding more effective.
A typical example of such a method is found in the MPEG-2 standard. In consideration of the importance of such a widely adapted standard, and by way of example, reference will be made in the following description to the MPEG-2 standard for making the illustration easier to follow. The considerations that will be made remain valid for other standards and for motion estimation systems.
A typical block diagram of a video MPEG-2 coder is depicted in FIG. 1. Such a system is made up of the following functional blocks:
Field ordinator. This block is composed of one or several field memories outputting the fields in the coding order required by the MPEG standard. For example, if the input sequence is I B B P B B P etc., the output order will be I P B B P B B . . .
I (Intra coded picture) is a field and/or a semifield containing temporal redundancy;
P (Predicted-picture) is a field and/or semifield from which the temporal redundancy with respect to the preceding I or P (previously codecoded) has been removed;
B (Bidirectionally predicted-picture) is a field and/or a semifield whose temporal redundancy with respect to the proceeding I and subsequent P (or preceding P and successive P) has been removed. In both cases the I and P pictures must be considered as already co/decoded.
Each frame buffer in the format 4:2:0 occupies the following memory space:
Motion Estimator. This block removes the temporal redundancy from the P and B pictures. This functional block operates only on the most energetic components of the sequence to be coded, i.e., the richest of picture information, such as the luminance component.
DCT. This block implements the discrete cosine transform (DCT) according to the MPEG-2 standard. The I picture and the error pictures P and B are divided in 8*8 blocks of pixels Y, U, V onto which the DCT is performed.
Quantizer Q. An 8*8 block resulting from the DCT is then divided by a quantizing matrix to reduce the magnitude of the DCT coefficients. In such a case, the information associated to the highest frequencies less visible to human sight tends to be removed. The result is reordered and sent to the successive block.
Variable Length Coding (VLC). The codification words output from the quantizer tend to contain a large number of null coefficients, followed by non-null values. The null values preceding the first non-null value are counted and the count figure constitutes the first portion of a codification word, the second portion of which represents the non-null coefficient.
These paired values tend to assume values more probable than others. The most probable ones are coded with relatively short words, e.g., composed of 2, 3 or 4 bits, while the least probable are coded with longer words. Statistically, the number of output bits is less than in the case such methods are not implemented.
Multiplexer and Buffer. Data generated by the variable length coder, the quantizing matrices, the motion vectors and other syntactic elements are assembled for constructing the final syntax contemplated by the MPEG-2 standard. The resulting bitstream is stored in a memory buffer. The size limit of the memory buffer is defined by the MPEG-2 standard, and cannot be overfilled. The quantizer block Q supports this limit by adjusting the division of the DCT 8*8 blocks. This division is based on the space available in the memory buffer, and on the energy of the 8*8 source block taken upstream of the motion estimation and the DCT process.
Inverse Variable Length Coding (I-VLC). The variable length coding functions specified above are executed in an inverse order.
Inverse Quantization (IQ). The words output by the I-VLC block are reordered in the 8*8 block structure, which is multiplied by the same quantizing matrix that was used for its preceding coding.
Inverse DCT (I-DCT). The DCT function is inverted and applied to the 8*8 block output by the inverse quantization process. This permits passing from the domain of spatial frequencies to the pixel domain.
Motion Compensation and Storage. Two pictures may alternatively be present at the output of the I-DCT block. First alternative is a decoded I picture or semipicture that must be stored in a respective memory buffer for removing the temporal redundancy from subsequent P and B pictures. A second alternative is a decoded prediction error picture (semipicture) P or B that must be summed to the information removed previously during the motion estimation phase. In case of a P picture, such a resulting sum stored in a dedicated memory buffer is used during the motion estimation process for the successive P pictures and B pictures. These field memories are generally distinct from the field memories that are used for re-arranging the blocks.
Display Unit. This unit converts the pictures from the format 4:2:0 to the format 4:2:2 and generates the interlaced format for displaying the images. The most energetic and therefore richer of information component, such as the luminance, is represented as a matrix of N lines and M columns. Each field is divided in portions called macroblocks, with each portion having R lines and S columns. The results of the divisions N/R and M/S must be two integer numbers, but not necessarily equal to each other.
Upon referring to FIG. 2a, two subsystems which include the estimator block shown in FIG. 1 are the following:
a) Motion estimator. This first subblock searches the predictors of each macroblock according to a certain estimation algorithm and decides, among the different predicting options described by the standard, the one that yields the best results. It supplies the successive block and the final buffer with the motion vectors and the type of prediction selected.
The prediction forms for the different types of fields are as follows. For the Intra fields, the prediction forms include Intra, P fields, Forward field, Frame field and Dual prime. For the B fields, the prediction forms include Field Forward, Field Backward, Filed Interpolated, Frame Forward, Frame Backward and Frame Interpolated
b) Decider. This sub-block is the part of the estimator that selects the coding mode of the single macroblocks, as well as the transform mode in the domain of the frequencies of the prediction errors relative to each one of them. The coding mode for the Intra fields are Intra. The different coding modes for the P fields are Intra, Forward Predictive and No motion compensated. The different coding modes for the B fields are Intra, Forward Predictive, Backward Predictive and Interpolated Predictive.
However, as far as the DCT block is concerned (FIG. 1), this decomposes the different macroblocks in smaller parts of 8*8 pixels. Nevertheless, the decomposition of the macroblocks in blocks may be carried out in two different modes: field or frame.
From FIG. 2b, and assuming macroblocks of 16*16 pixels wherein the rows belonging to two successive semifields compose the same picture, it is possible to divide the macroblock into 4 blocks. The discrete cosine transform may be applied onto the blocks in two different modes. One mode is the frame mode, wherein the four macroblocks include rows belonging to both semifields. A second mode is the field mode, wherein the four blocks include rows belonging to the same semifield.
However, according to the MPEG-2 video standard, there exist three possible coding modes for the fields present in each sequence: Intra (or I), P or B, as already mentioned above. In the case of Intra fields, each macroblock will be coded according to the Intra mode. For the case of non-Intra fields, it is possible to independently code each macroblock as described below.
If the current field is a P field, then the admitted coding modes are: Intra, No motion compensation and Motion compensation. In case of B pictures, the admitted coding modes are: Intra, Forward Predictive, Backward Predictive and Interpolated. Therefore, for Intra fields it is necessary to only decide how to apply the DCT to the different blocks in a field or frame mode. For non-Intra fields P and B, it is also necessary to establish how to predict the single macroblock being considered.
The decision algorithm (TM5). According to the known technique, as far as the decision on the type of DCT to be used is concerned, the procedure is the same for all types of fields and macroblocks. This procedure, as better illustrated in the ensuing description, is based on estimations of variance and correlation among samples of the detected predicted error. As far as the decision on the type of prediction is concerned, the procedure is based on variance estimates, wherein the selection of the prediction mode considers the variance estimates.
Selection of the prediction method. First, the algorithm calculates for a chosen type of prediction the mean square error:   MSE  =            ∑              j        =        0            15        ⁢          xe2x80x83        ⁢                  ∑                  i          =          0                15            ⁢              xe2x80x83            ⁢                        (                      ϵ                          i              ,              j                                )                2            
For xcex5i,j=p*i,jxe2x88x92pi,j, the variable p* indicates the pixels that compose the predictor, and p indicates the pixels that compose the current macroblock at the position (i,j). The current macroblock being processed will be coded without motion estimation or Intra if:   "AutoLeftMatch"      {                                        MSE             greater than             VAR                                                            MSE            ≥                          9              ·              256                                          
MSE is the mean square error, whereas VAR is the sampling variance of the reference macroblock, that is:   var  =                    ∑                  j          =          0                15            ⁢              xe2x80x83            ⁢                        ∑                      i            =            0                    15                ⁢                  p                      i            ,            j                    2                      -                            (                                    ∑                              j                =                0                            15                        ⁢                          xe2x80x83                        ⁢                                          ∑                                  i                  =                  0                                15                            ⁢                              p                                  i                  ,                  j                                                              )                2            256      
The variable Pi,j is the value of the pixel of the current macroblock. This means that * for the current macroblock to be coded as Intra, the predicting mean square error must be greater than the variance of the pixels that compose such a macroblock and also greater than a trial and error placed threshold. In such a case, it is no longer convenient to code the macroblock in the form of the predicting error because the entropy associated to it would be greater than the one associated to the single pixels that compose it, thus degrading the compression efficiency. For the current macroblock to be coded as non-Intra, it is sufficient that at least one of the two above mentioned conditions be unverified.
To code the current macroblock as Intra, the coder will not consider the values of its single pixels directly as samples that will be thereafter transformed and quantized, but will perform these operations on samples obtained from the pixels. This is done by subtracting from them the value 128 that is at the middle of the variability field. In contrast, should a non-Intra coding be decided, it is possible to discriminate among two different cases depending on whether the current picture from which originates the reference macroblock is either a B or a P picture.
In the first case, the system simply updates the structure predisposed to contain the motion vectors with the selected values. In the second case, instead, the coder may decide to code the macroblock with or without motion compensation (MC/no-MC). To do this, it calculates the mean square error between the reference macroblock and the one placed on the field on which the prediction is based with the same absolute coordinates of the first referred to as V0. A comparison is made with the previously found mean square error (MSE). The coder will decide to use the motion estimation result if, and only if, the following two conditions are simultaneously verified:   "AutoLeftMatch"      {                                                      V              0                         greater than                                           5                4                            ·              MSE                                                                                      V              0                        ≥                          9              ·              256                                          
FIG. 2c illustrates the scheme of the decision algorithm according to the known technique.
DCT field or frame. First, the procedure contemplates the calculation of the product of the variance estimates of the prediction errors of the two fields separately:   d  =      "AutoLeftMatch"          xe2x80x83        ⁢                  [                                            ∑                              j                =                0                            7                        ⁢                          xe2x80x83                        ⁢                                          ∑                                  i                  =                  0                                15                            ⁢                              ϵ                                  Top                                      (                                          i                      ,                      j                                        )                                                  2                                              -                      xe2x80x83                    ⁢                                                    (                                                      ∑                                          j                      =                      0                                        7                                    ⁢                                      xe2x80x83                                    ⁢                                                            ∑                                              i                        =                        0                                            15                                        ⁢                                          ϵ                                              Top                                                  (                                                      i                            ,                            j                                                    )                                                                                                                    )                            2                        128                          ]            ⁢              xe2x80x83            ·              "AutoLeftMatch"                  xe2x80x83                ⁢                  [                                                    ∑                                  j                  =                  0                                7                            ⁢                              xe2x80x83                            ⁢                                                ∑                                      i                    =                    0                                    15                                ⁢                                  ϵ                                      Bot                                          (                                              i                        ,                        j                                            )                                                        2                                                      -                          xe2x80x83                        ⁢                                                            (                                                            ∑                                              j                        =                        0                                            7                                        ⁢                                          xe2x80x83                                        ⁢                                                                  ∑                                                  i                          =                          0                                                15                                            ⁢                                              ϵ                                                  Bot                                                      (                                                          i                              ,                              j                                                        )                                                                                                                                )                                2                            128                                ⁢                      xe2x80x83                    ]                    
The variable xcex5Top is the prediction error of the Top field composed of the even lines of the field: 0, 2, 4, . . . The variable xcex5Bot is the predicting error of the Bottom field composed of the odd lines of the field: 1, 3, 5, . . .
In the case of Intra fields, the prediction error is calculated with respect to a hypothetical block whose pixels have all the same value equal to 128, which is at the center of the variability range of the pixel values 0-255. Should d be a null, the prediction error associated to at least one of the two fields is uniform, and thus it is convenient to compute a DCT field only one coefficient after DC1. At least one of the two fields has a null variance so the two fields are uncorrelated and the DCT is then computed in a Field mode. The Field mode is independent of the two semifields that make up the Y, U and V blocks. In contrast, if d is positive, then the correlation index should be calculated among the prediction errors on two fields:   ρ  =            [                                    ∑                          j              =              0                        7                    ⁢                      xe2x80x83                    ⁢                                    ∑                              i                =                0                            15                        ⁢                          (                                                                    ϵ                    Top                                    ⁡                                      [                                          i                      ,                      j                                        ]                                                  ·                                                      ϵ                    Bot                                    ⁡                                      [                                          i                      ,                      j                                        ]                                                              )                                      -                              (                                          ∑                                  i                  =                  0                                15                            ⁢                              xe2x80x83                            ⁢                                                ∑                                      i                    =                    0                                    15                                ⁢                                                                            ϵ                      Top                                        ⁡                                          [                                              i                        ,                        j                                            ]                                                        ·                                                            ∑                                              j                        =                        0                                            15                                        ⁢                                          xe2x80x83                                        ⁢                                                                  ∑                                                  i                          =                          0                                                15                                            ⁢                                                                        ϵ                          Bot                                                ⁡                                                  [                                                      i                            ,                            j                                                    ]                                                                                                                                          )                    /          128                    ]        /          d      
When such an index is high, i.e., greater than a predefined threshold, the DCT is calculated in a Frame mode. In contrast, if such an index is small, i.e., less than the predefined threshold, the prediction error among the two fields is uncorrelated. Therefor, the DCT is calculated in a Field mode.
From the above and according to the known technique, the selection of the prediction method and the discrete cosine transformation mode (field or frame) are both carried out through an algorithm which requires a significantly complex processing. This processing requires the calculation of variance and correlation indexes through the execution of multiplication, square root, division operations etc. For these reasons, a hardware implementation of the algorithm is rather burdensome and costly.
An object of the present invention is to provide a method for coding video data using a simpler hardware implementation than the known method. In particular, the object is to provide an algorithm for selection of the prediction method and the DCT mode that is simpler than the algorithm of the prior art and less burdensome to be implemented in hardware.
This object is achieved with the method of the present invention using operations that need to be performed via multiplications by powers of 2. This may be implementable without any appreciable additional hardware cost. The method of the invention is not based on the calculation of variance and correlation indexes, but rather on the evaluation of differences between adjacent pixels for identifying a complexity index, and for processing spread indexes of mean absolute prediction errors.
The architecture of this present invention may be advantageously used and is applicable to systems processing digitized pictures according to the MPEG-2 protocol, and also to protocols different from the MPEG-2 standard.