1. Field of the Invention
The present invention relates to an interframe redictive video coding apparatus and decoding apparatus, and also to an interframe predictive video coding method and decoding method.
2. Description of the Related Art
Interframe predictive coding is known as a video coding technique that enables highly efficient compression of motion image signals. This technique compresses video information by taking advantage of high similarity between each frame and the next (i.e., temporal coherency). More specifically, when encoding a given frame, the coder generates a prediction frame by applying motion vectors to the previous frame, calculates the difference between the predictive frame and the present frame, encodes the difference and motion vectors into compact variable-length codewords, and transmits them in the form of a coded bitstream.
There are several international standards in the technological field of motion video coding, e.g., ITU-T H.263, ISO/IEC MPEG-1 (Moving Picture Experts Group 1), and ISO/IEC MPEG-2. All those coding standards have adopted highly efficient algorithms that predict intermediate frames from two reference pictures. More specifically, such advanced algorithms include the following.
1) Bidirectional Frame Prediction PA0 This algorithm, exploiting the time correlation between frames, creates a prediction picture from the previous and next frames, thus making a bidirectional prediction. PA0 2) Half-pel Motion Estimation PA0 This algorithm treats motion vectors with half pel (picture element) accuracy and calculates pel values at half-pel positions from the adjacent pels to produce a prediction picture.
Both algorithms obtain prediction pictures by using interpolation techniques on the basis of mean value calculations. More specifically, the bidirectional frame prediction computes a prediction picture f(x, y) by the following equation (1). EQU f(x, y)=(g.sub.for (x+Vx.sub.for, Y+VY.sub.for)+g.sub.back (x+Vx.sub.back, y+Vy.sub.back))/2 (1)
where g(x,y) represents a reference picture obtained by locally decoding video data after quantization, Vx and Vy are X-axis and Y-axis components of a motion vector, and the suffixes "for" and "back" denote forward prediction and backward prediction, respectively.
In the case of the half-pel motion estimation, the prediction picture f(x,y) is obtained by either one of the following four equations (2a) to (2d), depending on the presence of half-pel components of a motion vector as will be explained later. EQU f(x, y)=g(x+Vx', y+Vy') (2a) EQU f(x, y)=(g(x+Vx', y+Vy')+g(x+Vx'+1, y+Vy'))/2 (2b) EQU f(x, y)=(g(x+Vx', y+Vy')+g(x+Vx', y+Vy'+1))/2 (2c) EQU f(x, y)=(g(x+Vx', y+Vy')+g(x+Vx'+1, y+Vy')+g (x+Vx', y+Vy'1)+g(x+Vx'+1, y+Vy'+1))/4 (2d)
where Vx' and Vy' represent the integer parts of motion vector components Vx and Vy, respectively.
Equation (2a) is used to calculate a prediction picture f(x, y) when neither of the X-axis and Y-axis motion vector components, Vx and Vy, has a half-pel component. Equation (2b) gives a prediction picture f(x, y) when the motion vector component Vx has a half-pel component but the Y-axis component Vy does not. Equation (2c) gives a prediction picture f(x, y) when, in turn, the motion vector component Vx has no half-pel component but the Y-axis component Vy has one. Equation (2d) provides a prediction picture f(x, y) when both of the X-axis and Y-axis motion vector components Vx and Vy have a half-pel component.
In that way, the picture prediction is achieved by either one of the four different equations (2a) to (2d) depending on the presence of half-pel components in a motion vector. This is because the motion vector resolution is extended to half-pel accuracy in this motion estimation algorithm, and the interpolation of pel values should be conducted in different ways depending on the half-pel components.
The values of motion vector components, Vx and Vy, is classified as shown in FIG. 12 in terms of the polarity of their respective integer parts Vx' and Vy' and the presence of half-pel components. Here, the symbols "Vx-half" and "Vy-half" are binary values that express the presence ("1") or absence ("0") of a half-pel component in the motion vector component Vx or Vy.
In the meantime, the aforementioned international standard coding systems treat the pel values as integer variables. It is therefore necessary, in the above-described interpolation operations, to round off the resultant pel values to integer values. The standard coding systems actually require each pel value to be rounded to the nearest integer, and particularly if its decimal fraction is exactly 0.5, the pel value must be rounded toward the direction away from zero. With this definition, fractional values 3/2, 5/4, and 7/4, for example, will be rounded off to 2, 1, and 2, respectively. Since the pel values are usually represented in positive integers in motion compensation, the above rounding method can be paraphrased as "rounding off toward the positive infinity."
As described above, the interpolated pel values are rounded off to integers; however, this rounding operation may introduce some errors to predicted pictures. Take a pel value of 1.5 for instance. There exists an error of 0.5 between the original value 1.5 and the rounded value 2. Since the interframe predictive coding system uses a locally decoded picture of the immediately preceding frame as a reference to the next frame, errors in the prediction picture would accumulate as the frame sequence proceeds, unless the difference between the predicted pels and entered pels is not transmitted to the decoding ends.
FIG. 13 shows how the rounding errors accumulate in the case that the interpolated pel values are rounded upward, or toward positive infinity. In contrast to this, FIG. 14 shows how the rounding errors accumulate in the case that the interpolated pel values are rounded downward, or toward negative infinity.
In both FIGS. 13 and 14, the square cells represent individual pels, and the figures within the cells show their respective pel values. The symbol "IT" indicates frame counter values, which increases as the frame sequence advances. The hatched cells represent predicted images of an object that is moving in the +X direction at the rate of 0.5 pel per frame. FIGS. 13 and 14 illustrate how this moving object image varies with time, where the pel values of the background image are all set to zeros in the initial frame (T=0) for illustrative purposes. It should be noted that, in the frame of T=1, both ends of the object image are actually located at half-pel grids.
As FIGS. 13 and 14 show, the interpolation introduces some softness in the distribution curve of pel values because of the effect of averaging between pels. It has to be noted here that the summations of all pel values including background exhibit an increase of 3 at every frame transition, as indicated by the "Total=" notes on the right hand side of FIG. 13. On the other hand, the same measurement in FIG. 14 shows a decrease of 3 at every frame transition.
For comparison, FIG. 15 presents a result of interpolation performed in the same situation except that the rounding operations are disabled. In this case, the pel values, being handled as real number variables, do not show any variations in their totals.
The above-mentioned coding standards MPEG-1 and MPEG-2 require intra-coded pictures to be inserted forcibly into the output picture sequence at predetermined frame intervals. They also assume such application environments where coders and decoders can communicate at relatively higher bitrates. Those constraints make it possible to deliver the prediction errors, or the difference between predicted pel values and source pel values, to the decoders without reducing the information. Therefore, the problem of rounding error accumulation discussed above will never occur in MPEG-1 or MPEG-2 video coding.
In the H.263 recommendation, however, the prediction errors do not always reach the receiving ends because H.263 assumes the use in low bitrate communication environments such as existing analog telephone lines. In extreme cases, it can happen that most of the transmission bandwidth is spent only for motion vector data and there is no room remained for sending prediction error information. Moreover, the H.263 encoders are unable to insert intra-coded pictures frequently, because the intraframe coding produces a large amount of coded data. All those restricted situations pose an error accumulation problem to H.263, which would typically appear as picture quality degradation, such as a change in color tones from white to red, for example, observed in the reconstructed pictures.