In recent years, along with a development of multimedia applications such as picture, audio and text, it has become general to handle all sorts of media in an integrated way. However, an information compression technique for data is dispensable for its storage and transmission since a digitalized picture contains an enormous amount of data. On the other hand, a standardization of compression techniques is also important for interoperating compressed picture data. The standards of picture compression techniques include H.261, H.263 established by the ITU (International Telecommunication Union) and MPEG (Moving Picture Experts Group)-1, MPEG-2 and MPEG-4 established by the ISO (International Organization for Standardization).
An inter-picture prediction which accompanies motion compensation can be cited as a technique shared among these moving picture coding methods. In the motion compensation based on these moving picture coding methods, a picture of an input image is divided into blocks, each of which has a predetermined size, and a predictive image is generated for each block using motion vectors, respectively indicating a motion between pictures. The following predictions are employed for the inter-picture prediction according to the MPEG: a forward prediction for a prediction using a single picture whose display time is earlier than that of a current picture to be coded; a backward prediction for a prediction using a single picture whose display time is later than that, of the current picture; a bi-directional prediction for a prediction using two pictures, that is, one picture whose display time is earlier than the current picture and the other picture whose display time is later than that of the current picture (see reference, for example, ISO/IEC 14496-2:1999(E) Information technology—coding of audio-visual objects Part 2: Visual (1999-12-01) pp 150 7.6.7 Temporal prediction structure).
In the MPEG, a reference picture to be used is determined uniquely depending on the type of inter-picture prediction and an arbitrary reference picture cannot be selected. In the meantime, a bi-directional prediction which is expanded so that two arbitrary reference pictures can be selected out of a plurality of coded pictures stored in a picture memory regardless of the display time of the current picture is taken under the consideration in the H.264 which is presently under the process of standardization by the ITU.
FIG. 1 is a block diagram showing a structure of a moving picture coding apparatus according to the H.264. The conventional moving picture coding apparatus shown in FIG. 1 is an apparatus for executing a moving picture coding method which allows a selection of two arbitrary reference pictures from plural coded pictures when the inter-picture prediction is operated.
This moving picture coding apparatus includes, as shown in FIG. 1, a motion estimation unit 301, a pixel interpolation unit 102, a subtractor 103, a picture coding unit 104, a picture decoding unit 105, an adder 106, a variable length coding unit 302, a multi-picture buffer 108 and a switch 109.
The moving picture coding apparatus divides an inputted image data Img into blocks and performs processing for each of the blocks. The subtractor 103 subtracts a predictive image data Pred from the image data Img inputted to the moving picture coding apparatus and outputs it as residual data Res. The picture coding unit 104 performs picture coding processing such as orthogonal transformation and quantization on the inputted residual data Res and outputs it as coded residual data ERes including quantized orthogonal transformed coefficients. The picture decoding unit 105 performs picture decoding processing such as inverse quantization and inverse orthogonal transformation on the inputted coded residual data ERes and outputs it as decoded residual data DRes. The adder 106 adds the decoded residual data DRes to the predictive image data Pred and outputs it as reconstructed image data Recon. Out of the reconstructed image data Recon, the data having the possibility to be used for reference in the subsequent inter-picture prediction is stored in the multi-picture buffer 108.
Here, an interpolation prediction using two reference pictures performed by the conventional moving picture coding apparatus is described with reference to FIG. 2. FIG. 2 is a conceptual diagram of the interpolation prediction using plural reference pictures. Here, a picture Pic is a current picture to be coded. Pictures FwRef1˜FwRef3 represent coded pictures respectively having a display time earlier than that of the current picture whereas pictures BwRef1˜BwRef3 represent coded pictures respectively having a display time later than that of the current picture. A block Blk1 is predicted using pixel values in a reference block RefBlk11 included in the picture FwRef3 whose display time is earlier than that of the current picture Pic and pixel values in a reference block RefBlk12 included in the picture BwRef1 whose display time is later than that of the current picture Pic. A block Blk2 is predicted using pixel values in reference blocks RefBlk21 and RefBlk22 included in two pictures FwRef1 and FwRef2 respectively having a display time earlier than that of the current picture. A block Blk3 is predicted using pixel values in reference blocks RefBlk31 and RefBlk32 included in two pictures BwRef1 and BwRef2 respectively having a display time later than that of the current picture. Namely, a result of interpolating pixels in the areas corresponding to the two reference blocks using a predetermined method such as the one using an average value is considered to be a predictive image. The characteristics of the conventional moving picture coding apparatus is to perform prediction on a block-by-block basis using arbitrary two reference pictures as shown in FIG. 2. A method for predicting with the use of two arbitrary reference pictures as described above is called “plural reference picture interpolation prediction” hereinafter. The prediction method includes a method in which a block included in a single arbitrary picture is used directly as a predictive image and the intra-picture prediction other than the method of generating a predictive image using the pixel interpolation as described above, and it is possible to switch the prediction method on a block-by-block basis.
The motion estimation unit 301 determines a prediction type for the block, reference pictures and motion vectors to be used for inter-picture prediction performed on the inputted current block to be coded and outputs a prediction type PredType, reference picture numbers RefNo1, RefNo2, and motion vectors MV1, MV2. The motion estimation 301 outputs two picture numbers and two motion vectors since two reference pictures are selected when plural reference picture interpolation prediction is operated. Here, the multi-picture buffer 108 outputs a reference block RefBlk1 corresponding to the reference picture number RefNo1 and the motion vector MV1 and a reference block RefBlk2 corresponding to the reference picture number RefNo2 and the motion vector MV2. The pixel interpolation unit 102 performs interpolation for the pixels with respect to the two reference blocks RefBlk1 and RefBlk2 using average value and outputs it as an interpolated block RefPol. On the other hand, in the case of using an inter-picture prediction other than a plural reference picture interpolation prediction, the motion estimation unit 301 selects a single reference picture, and therefore, outputs a single reference picture number RefNo1 and a single motion vector MV1. In this case, the multi-picture buffer 108 outputs a reference block RefBlk with respect to the reference picture number RefNo1 and the motion vector MV1.
When the prediction type determined by the motion estimation unit 301 indicates a plural reference picture interpolation prediction, the switch 109 is switched to a “1” side and the interpolated block RefPol is used as a predictive image data Pred. When the prediction type PredType indicates an inter-picture prediction other than a plural reference picture interpolation prediction, the switch SW11 is switched to a “0” side and the reference block RefBlk is used as a predictive image data Pred. The variable length coding unit 302 performs variable length coding on the coded residual data ERes, the prediction type PredType, the reference picture numbers RefNo1, RefNo2 and the motion vectors MV1, MV2 and then outputs them as coded moving picture data Str0.
FIG. 3 is a conceptual diagram of a data format of coded moving picture used by the conventional moving picture coding apparatus. Coded data equivalent to a single picture, Picture, is composed of coded data equivalent to a single block, Block, where each block composes a picture, and the like. Here, the coded data equivalent to a single block, Block, presents coded data of a block on which a plural reference picture interpolation prediction is performed, and includes the reference picture numbers RefNo1, RefNo2, the motion vectors MV1, MV2, with respect to the two reference pictures, the prediction mode PredType, and the like, in the coded data.
FIG. 4 is a block diagram showing a structure of the conventional moving picture decoding apparatus. The moving picture decoding apparatus includes, as shown in FIG. 4, a variable length decoding unit 601, a motion compensation unit 602, a picture decoding unit 404, an adder 405, a pixel interpolation unit 406, a multi-picture buffer 407 and a switch 408.
The variable length decoding unit 601 performs variable length decoding on the inputted coded image data Str0 and outputs the coded residual data ERes, the motion vectors MV1, MV2, the reference picture numbers RefNo1, RefNo2 and the prediction type PreType. The picture decoding unit 404 performs picture decoding processing such as inverse quantization and inverse orthogonal transformation on the inputted coded residual data ERes and outputs decoded residual data DRes. The adder 405 adds the decoded residual data DRes to the predictive image data Pred and outputs it as decoded image data DImg outside the moving picture decoding apparatus. The multi-picture buffer 407 stores the decoded image data DImg for inter-picture prediction.
The motion compensation unit 602 outputs reference picture numbers NRefNo1, NRefNo2 of the reference blocks necessary for inter-picture prediction according to the prediction type PredType as well as the motion vectors MV1, MV2 and instructs the multi-picture buffer 407 to output the reference blocks. When the prediction type PredType indicates a plural reference picture interpolation prediction, the multi-picture buffer 407 outputs the reference block RefBlk1 corresponding to the reference picture number NRefNo1 and the motion vector NMV1 as well as the reference block RefBlk2 corresponding to the reference picture number NRefNo2 and the motion vector NMV2. The pixel interpolation unit 406 interpolates the pixels in the two reference blocks RefBlk1 and RefBlk2 using the average value. On the other hand, when the prediction type PredType indicates an inter-picture prediction method other than a plural reference picture interpolation prediction, the multi-picture buffer 407 outputs the reference block RefBlk corresponding to the reference picture number NRefNo1 and the motion vector NMV1.
When the prediction type PreType indicates a plural reference picture interpolation prediction, the switch 408 is switched to a “0” side and an interpolated block RefPol is used as a predictive image data Pred. Thus, the moving picture decoding apparatus decodes the coded moving picture data Str0 through the processing described above and outputs it as decoded image data DImg.
Meanwhile, under the moving picture coding method based on the MPEG-4, a plural reference picture interpolation prediction method called “direct mode” is defined for a picture type, called “bi-directional predictive picture”, employing a plural reference picture interpolation prediction. It is defined as a method to abbreviate the motion vectors and the reference picture numbers included in the coded data of the block by calculating the motion vectors with respect to two reference pictures used for the generation of the predictive image by means of interpolation using the coded motion vectors.
FIG. 5 is an illustration for a case of using the direct mode defined in the MPEG-4. Here, a picture Pic represents a current picture to be coded, a picture Ref1 represents a reference picture whose display time is earlier than that of the current picture. Pic and a picture Ref2 represents a reference picture whose display time is later than that of the current picture Pic whereas a block Blk represents a current block to be coded and a block Blk0 represents a block whose position is same as that of the current block Blk in the reference picture Ref2. A motion vector MV01 represents a forward reference motion vector using the picture Ref1 as a reference picture for coding the block Blk0, a motion vector MV1 represents a motion vector of the current block with respect to the reference picture Ref1, a motion vector MV2 represents a motion vector of the current block with respect to the reference picture Ref2, a block RefBlk1 represents a reference block to be referred to by the motion vector MV1 and a block RefBlk2 represents a reference block to be referred to by the motion vector MV2.
As for the two pictures to be used for reference by the current block Blk, the picture Ref2 whose display time is later than and is closest to the current picture is used as a backward reference picture whereas the picture Ref1, which has been used for reference by the block Blk0 at the time of coding, is used as a forward reference picture.
For the calculation of the motion vectors, it is assumed that either the motion is constant or no motions are found in comparing the pictures. Here, with an assumption that a differential value between the display time of the current picture Pic and that of the reference picture Ref1 is TRD1, a differential value between the display time of the reference picture Ref1 and that of the reference picture Ref2 is TRD2, and a differential value between the display time of the current picture Pic and that of the reference picture Ref2 is TRD3, the motion vectors MV1 and MV2 to be used for coding the current block can be calculated respectively using the following equations:MV1=MV01×(TRD1/TRD2)   (Equation A)MV2=−MV01×(TRD3/TRD2)   (Equation B)
Using the above method, the reference pictures and the motion vectors in the case of using a direct mode can be determined. The processing in the case of using a direct mode as described above, performed by the moving picture coding apparatus, is executed by the motion estimation unit 301 shown in the block diagram illustrating the conventional moving picture coding apparatus in FIG. 1. The processing for the case of using a direct mode described above, performed by the moving picture decoding apparatus, is executed by the motion compensation unit 602 shown in the block diagram illustrating the conventional moving picture decoding apparatus in FIG. 4
When a moving picture, in which a motion between the pictures is small, is inter-picture coded, a predictive error between the pictures become very small and most of the coded residual data ERes become “0” by performing picture coding processing such as quantization. A case in which the entire coded residual data ERes resulted from the inter-picture prediction using the reference pictures and the motion vectors of the current block is “0” in the coding in which the motion vectors and the reference pictures are determined using a predetermined method without coding them, as in the case of using a direct mode as described above, is defined as one of the prediction types PredType called “skip mode”. In using a skip mode, only the prediction type PredType indicating the skip mode is transmitted, therefore, coding of a block requires a very small code amount. The efficiency of coding can be further improved by assigning variable length code that is shorter than other prediction types to this skip mode or by run-length coding the number of consecutive blocks used for the skip mode.
In the H.264 described above, “skip mode” is defined as a case in which the entire coded residual data equivalent to a single block obtained by the inter-picture prediction using a direct mode is assumed to be “0”, The following processing is performed when a block is coded using a skip mode by the moving picture coding apparatus shown in FIG. 1. The motion estimation unit 301 outputs the reference picture numbers RefNo1, RefNo2, the motion vectors MV1, MV2 as well as the prediction type PredType indicating a skip mode. The variable length coding unit 302 performs variable length coding only for the prediction type PredType and outputs it as coded moving picture data Str0 through the processing explained above, when the prediction type PredType indicates a skip mode. The following processing is performed when the coded data of the block coded using a skip mode is inputted to the moving picture decoding apparatus shown in FIG. 4. The variable length decoding unit 601 performs variable length decoding on the prediction type PredType. When the prediction type PredType indicates a skip mode, the motion compensation unit 602 outputs, through the processing operated in the case of direct mode explained above, the reference picture numbers NRefNo1, NRefNo2, the motion vectors NMV1, NMV2 as well as the prediction type PredType indicating a skip mode.
In the H.264 as described above, arbitrary reference pictures can be selected out of a plurality of coded pictures regardless of the display time of the current picture. However, the arbitrary reference pictures are selected by performing motion estimation for the plurality of the coded pictures in this case, therefore, the processing burden caused by the motion estimation becomes very large. The plural reference picture interpolation prediction also contains a problem of degrading the coding efficiency since it requires coding of reference picture numbers and motion vectors for every two reference pictures.
Furthermore, when inter-picture prediction is performed for a picture using a picture whose display time is later than that of the current picture as a reference picture, as in the case of bi-directional prediction described in the conventional technique, the picture has to be coded in an order different from a display order, which causes a delay. In a case of real time communication such as a videophone, bi-directional predictive pictures cannot be used because of the delay. In the H.264, however, two arbitrary reference pictures can be selected regardless of display order information, therefore, the delay caused by coding can be eliminated by performing a plural reference picture interpolation prediction with a selection of two pictures respectively having a display time which is earlier than that of the current picture. However, the picture whose display time is later than that of the current picture is not stored in the multi-picture buffer, therefore, the direct mode conventionally used for determining the motion vectors using the picture whose display time is later than that of the current picture as described above cannot be employed.