An image coding apparatus typically compresses information by utilizing spatial and temporal redundancies of an image (including a still image and a moving image). Frequency domain transform is used as a method that utilizes the spatial redundancy. Inter prediction is used as a method that utilizes the temporal redundancy. Inter prediction is also referred to as inter-picture prediction.
When coding a current picture, an image coding apparatus that employs inter prediction uses a coded picture preceding or following the current picture in display order (display time order), as a reference picture. The image coding apparatus applies motion estimation to the current picture with respect to the reference picture, to derive a motion vector.
The image coding apparatus then performs motion compensation based on the motion vector, to obtain prediction image data. The image coding apparatus calculates a difference between the prediction image data and the image data of the current picture, and codes the calculated difference. The image coding apparatus thus removes the temporal redundancy.
In the motion estimation, the image coding apparatus calculates a difference between a current block to be coded in the current picture and each block in the reference picture, and determines, as a reference block, a block having a smallest difference in the reference picture. The image coding apparatus estimates the motion vector using the current block and the reference block.
An image coding apparatus according to the standardized image coding scheme called H.264 (see Non Patent Literature (NPL) 1) uses three picture types, namely, I picture, P picture, and B picture, for compressing information. The image coding apparatus does not perform inter prediction on the I picture, but performs intra prediction on the I picture. Intra prediction is also referred to as intra-picture prediction.
The image coding apparatus performs inter prediction on the P picture, by referencing to one coded picture preceding or following the current picture in display order. The image coding apparatus performs inter prediction on the B picture, by referencing to two coded pictures preceding or following the current picture in display order.
In inter prediction, the image coding apparatus generates a reference list (also referred to as a reference picture list) for specifying a reference picture. A coded reference picture which is referenced to in inter prediction is assigned a reference picture index (also referred to as a reference index) in the reference list. For example, the image coding apparatus holds two reference lists (L0, L1) to reference to two pictures for the B picture.
FIG. 34 shows an example of such reference lists. Reference picture list L0 in FIG. 34 is an example of a reference picture list corresponding to a first prediction direction in bidirectional prediction. In reference picture list L0 in FIG. 34, reference picture r1 whose display order number is 2 is assigned a reference picture index of 0. Reference picture r2 whose display order number is 1 is assigned a reference picture index of 1. Reference picture r3 whose display order number is 0 is assigned a reference picture index of 2.
That is, in reference picture list L0 in FIG. 34, a reference picture closer to the current picture in display order is assigned a smaller reference picture index.
Reference picture list L1 in FIG. 34 is an example of a reference picture list corresponding to a second prediction direction in bidirectional prediction. In reference picture list L1 in FIG. 34, reference picture r2 whose display order number is 1 is assigned a reference picture index of 0. Reference picture r1 whose display order number is 2 is assigned a reference picture index of 1. Reference picture r3 whose display order number is 0 is assigned a reference picture index of 2.
Thus, two different reference picture indexes may be assigned to a specific reference picture included in two reference picture lists (e.g. reference pictures r1 and r2 in FIG. 34), and the same reference picture index may be assigned to a specific reference picture included in two reference picture lists (e.g. reference picture r3 in FIG. 34).
Prediction using only reference picture list L0 is called L0 prediction. Prediction using only reference picture list L1 is called L1 prediction. Prediction using both reference picture lists L0 and L1 is called bidirectional prediction or bi-prediction.
In L0 prediction, the preceding direction is often used as the prediction direction. In L1 prediction, the following direction is often used as the prediction direction. Reference picture list L0 is set to correspond to the first prediction direction, whereas reference picture list L1 is set to correspond to the second prediction direction.
Based on these relations, the prediction direction is classified as any of the first prediction direction, the second prediction direction, and the bidirection. Prediction in the case where the prediction direction is the bidirection is also referred to as bidirectional prediction or bi-prediction.
In the image coding scheme called H.264, a motion estimation mode is available as a coding mode (also referred to as an inter prediction mode or a prediction mode) for the current block in the B picture.
In the motion estimation mode, the image coding apparatus estimates the motion vector of the current block. The image coding apparatus generates the prediction image data using the reference picture and the motion vector. The image coding apparatus then codes the difference between the prediction image data and the image data of the current block and the motion vector used for the generation of the prediction image data.
As mentioned above, the motion estimation mode includes bidirectional prediction for generating the prediction image by referencing to two coded pictures preceding or following the current picture. The motion estimation mode also includes unidirectional prediction for generating the prediction image by referencing to one coded picture preceding or following the current picture. Bidirectional prediction or unidirectional prediction is selected for the current block.
The image coding apparatus according to the image coding scheme called H.264 is also capable of selecting a coding mode referred to as a temporal direct mode, when deriving the motion vector in the coding of the B picture. The method of inter prediction in the temporal direct mode is described below, with reference to FIG. 35.
FIG. 35 is a diagram showing the motion vector in the temporal direct mode. FIG. 35 shows an example where the image coding apparatus codes block a in picture B2 in the temporal direct mode. In this case, the image coding apparatus uses motion vector vb that has been used when coding block b, which is at the same position as block a, in picture P3 which is a reference picture following picture B2. Motion vector vb references to picture P1.
Upon coding block a, the image coding apparatus obtains a reference block from each of picture P1 which is a preceding (forward) reference picture and picture P3 which is a following (backward) reference picture, using a motion vector parallel to motion vector vb. The image coding apparatus then performs bidirectional prediction to code block a. That is, the image coding apparatus codes block a by using motion vector va1 to picture P1 and motion vector vat to picture P3.
Moreover, a merge mode is available as a coding mode for the current block in the B picture and the P picture. In the merge mode, the image coding apparatus copies a motion vector and a reference picture index from an adjacent block of the current block, to code the current block. The image coding apparatus also adds, for example, an index of the adjacent block whose motion vector and reference picture index have been copied, to a bitstream. This enables the decoder to select the same motion vector and reference picture index as those used in the coder.
A specific example is described below, with reference to FIG. 36A. In FIG. 36A, adjacent block A is a coded block that is left adjacent to the current block. Adjacent block B is a coded block that is upper adjacent to the current block. Adjacent block C is a coded block that is upper right adjacent to the current block.
In FIG. 36A, adjacent block A is a block coded by bidirectional prediction, and has motion vector MvL0_A of the first prediction direction and motion vector MvL1_A of the second prediction direction. Adjacent block B is a block coded by unidirectional prediction, and has motion vector MvL0_B of the first prediction direction. Adjacent block C is a block coded by unidirectional prediction, and has motion vector MvL0_C of the first prediction direction.
In the example in FIG. 36A, motion vectors MvL0_A, MvL0_B, and MvL0_C reference to the same reference picture RefId×L0, while motion vector MvL1_A references to reference picture RefId×L1.
In this example, the image coding apparatus selects, from adjacent blocks A, B, and C, an adjacent block whose motion vector and reference picture index are to be copied to the current block. Here, the image coding apparatus selects such an adjacent block that maximizes the coding efficiency. The image coding apparatus then adds a merge block index indicating the selected adjacent block, to the bitstream.
For instance, in the case of selecting adjacent block A, the image coding apparatus codes the current block using motion vectors MvL0_A and MvL1_A and the reference pictures referenced to by motion vectors MvL0_A and MvL1_A. The image coding apparatus then adds only a merge block index indicating the use of adjacent block A, to the bitstream.
FIG. 36B shows an example of the merge block index. The image coding apparatus adds only such a merge block index to the bitstream, thus reducing the amount of information for motion vectors and reference picture indexes.