The following explains the general outlines of a moving picture encoding/decoding method for performing encoding and decoding on a block basis.
As shown in FIG. 3, one frame of a moving picture consists of one luminance signal (Y signal 61) and two color difference signals (Cr signal 62 and Cb signal 63), and each color difference signal has an image size the length and width of which are one-half of those of the luminance signal, respectively. In the common video standards, each frame of a moving picture is divided into small blocks as shown in FIG. 3, and reproduction is made in units of blocks called macroblocks. FIG. 5 shows the structure of a macroblock. The macroblock consists of a Y signal block 30 of 16×16 pixels, and a Cr signal block 31 and a Cb signal block 31, both made of 8×8 pixels spatially matching each other.
Video coding is performed in units of macroblocks shown above. The coding methods are roughly divided into two types called intra coding (intra mode) and predictive coding (inter mode), respectively. Intra coding is a spatial data compression method which performs DCT on an input macroblock image to be encoded, or an error macroblock image that takes a difference between the input macroblock image and a predicted macroblock image created by making a spatial prediction of the input macroblock image, and performs quantization and encoding on each transform coefficient. This intra coding is applied to macroblocks (including the first coded frame) that bear no resemblance to their previous frames, or portions containing accumulated arithmetic operation errors resulting from DCT that should be resolved.
The predictive coding algorithm is called MC-DCT (Motion Compensation-Discrete Cosine Transform). Motion compensation is a compression technique for searching a reference frame for a portion similar to the contents of a target macroblock, and encoding the amount of motion (motion vector) along the time axis. Typically, the macroblock is further divided into smaller blocks so that a motion vector will be calculated for each smaller block. For example, MPEG-4 Part 10 (Advanced Video Coding) assumes macroblock partition types (luminance component) for motion compensation as shown in FIG. 7. The basics are four types 51 to 54. The type 54 is divided into four 8×8 blocks 54-0 to 54-3, and formulated to further select one partition type from five types, 54a, 54b, 54c, 54d, and intra coding, for each of the blocks 54-0 to 54-3. A motion vector in each smaller block is detected by selecting a portion in which the sum of absolute values of prediction error signals or the sum of squared errors is small in the block. The sum-of-absolute values scheme is used when the computation speed is critical, while the sum-of-squared errors scheme is used in pursuit of coding efficiency. Further, in pursuit of coding efficiency, another method may be applied, in which the amount of coding is converted to an evaluation value for the sum-of-squared errors to calculate the optimum coding mode and the amount of motion using both the prediction error and the amount of coding. FIG. 4 shows the structure of motion compensation processing for one block. FIG. 4 illustrates a predicted block 75 and a motion vector 76 on a previous frame 73 (reference frame) with respect to a luminance signal block 72 surrounded by a bold border on a current frame 71. The motion vector 76 represents the movement from a block 74 (dashed box), located spatially in the same position as the bold-bordered block on the current frame, to the predicted block region 75 on the previous frame (where the length of the motion vector for each color difference signal is one-half of that for the luminance signal, and is not encoded). After this motion compensation, DCT is performed on an error macroblock image that takes a difference between an input macroblock image and a predicted macroblock image consisting of multiple predicted blocks, and quantization and encoding are performed on each transform coefficient. The motion vector in the detected macroblock is also encoded. Since motion vectors of adjacent blocks have values close to each other, a difference value between the motion vectors of the adjacent blocks is typically encoded.
As motion compensation methods for predictive coding, there is bi-directionally predictive coding that performs MC using past and future frames as reference frames, as well as forward predictive coding that performs MC using a past frame as a reference frame. The motion compensation for forward predictive coding involves forward prediction only. On the other hand, the motion compensation for bi-directional coding includes backward prediction, bi-directional prediction, and direct prediction, as well as forward prediction. The bi-directional prediction is to perform interpolation on each pixel in the forward-predicted and backward-predicted blocks, and create interpolated predicted blocks. The direct prediction is bi-directional prediction using a motion vector from a future frame to a past frame along the time axis. In the forward, backward, or bi-directional prediction mode, a motion vector corresponding to a forward or backward motion vector or motion vectors corresponding to forward and backward motion vectors are encoded respectively. On the other hand, it is unnecessary to encode any motion vector in the direct mode. FIG. 9 shows the concept of prediction in the direct mode. As shown, a forward motion vector 132 from a block (collocated block 131) on a backward reference frame 130, the block 131 spatially corresponding to a block 121 to be predicted on a current frame 120, is reduced or divided into a forward motion vector 122 and a backward motion vector 123 at a ratio corresponding to the ratio of inter-frame distances along the time axis. Using these divided motion vectors, interpolation is performed in the same manner as in the bi-directional prediction mode.
A frame in which intra coding is applied to all the macroblocks is called an I-picture. A frame coded using forward predictive coding or intra coding is called a P-picture. A frame coded using bi-directional coding or intra coding is called a B-picture.
Although the above describes commonly used encoding and decoding methods, functions to increase the freedom of choice tend to be applied to recent encoding and decoding methods. The following describes some of new functions. The use of these functions is also contemplated in MPEG-4 Part 10 (Advanced Video Coding).
1. Multiple Reference Frames
The above describes that one reference frame is used for motion compensation for a P-picture, and two reference frames, that is, a past frame (forward reference frame) and a future frame (backward reference frame) are used for motion compensation for a B-picture. There is also such a method to prepare multiple past frames and multiple future frames as reference frames so that a different reference frame can be selected on a macroblock basis or for each of smaller blocks into which each macroblock is divided. Further, the conventional methods use an I-picture or P-picture as a reference frame, whereas the new functions allow the selection of a B-picture as a reference frame.
2. Bi-directional Reference Frame Prediction
When this method uses multiple reference frames, past frames can be included as possible backward reference pictures. This method also allows the backward reference pictures to be all past frames. Therefore, the term bi-predictive is used as a generic name for bi-directional. When both of two reference frames 140 and 150 are past frames or future frames, the way of coding a motion vector 127 to the reference frame 150 farther from a current frame is changed. As shown in FIG. 10, the horizontal and vertical components of a difference vector 126 between the motion vector 127 and a motion vector 125, which is calculated from a motion vector 124 to the reference frame 140 closer to the current frame 121 at a ratio corresponding to the ratio of inter-frame distances along the time axis, are coded respectively.
3. Change of Encoding/Decoding Order
The order of frame processing has conventionally complied with such a format as shown in FIG. 11 in which an I-picture and P-pictures are processed in display order, and two consecutive B-pictures arranged between two I/P-pictures are processed immediately after the backward I/P-picture on the time axis. On the other hand, the new functions are not limited to the processing order as long as the processing is done within the range of allowable display delays. When the bi-predictive concept is used, a B-picture(s) can occur even if there is no reference frame for backward prediction. Since the display order is coded as the data header of video data, or managed in sync processing between video data and audio/voice data as the upper concept of video data, a communication layer for control of dividing and distributing data, or a file format, there occurs no display misalignment resulting from a change in encoding/decoding order.
4. Frame Identification
Information indicating the display position of each frame is coded in the conventional. The display position information, however, may not match time information included in a communication packet or file format actually used for display. To avoid this problem, a method of managing each frame of video data using processing number only has been contemplated. However, in a moving picture encoding/decoding system into which the new functions are introduced, there may be no backward reference frame used in the direct mode, or a backward reference frame set by default from multiple backward reference frames may not be a future frame. Such a frame cannot adapt to the direct mode. Further, if each frame is managed by numbers in decoding order, it cannot be determined whether a backward reference frame can be utilized. In addition, when a B-picture is selected as a backward reference frame used in the direct mode, a collocated block may have no forward motion vector. Such a block cannot adapt to the direct mode.
In view of the above problems, it is an object of the present invention to provide an encoding/decoding method to which the direct mode can be applied efficiently.