In motion compensated prediction in inter-frame prediction, an image of a block to be encoded (current prediction unit: Cur_PU) is predicted using an image of a previously encoded reference picture (Ref Pic). The reference picture used for the motion compensated prediction of the block to be encoded is identified by a reference picture index (RefPicIdx). The position of the image of the reference picture used for the motion compensated prediction of the block to be encoded is identified by a motion vector (MV). Typically, the MV is predicted based on a predictive motion vector (PMV), and a differential motion vector (DMV=MV−PMV) is transmitted.
In motion vector search, the combination of the reference picture index and the motion vector that minimizes the coding cost is determined so as to achieve favorable coding efficiency of the block to be encoded. The coding cost J of a motion vector my is defined by the sum of absolute values of transform coefficients of motion compensated prediction errors (sum of absolute transformed differences: SATD), the reference picture index, and the code amount (RMotion) of the differential motion vector, as in the following Expression (1).[Math. 1]J(mv)=SATD(mv)+λMotion·RMotion(mv)  (1)
Here, λ is a function dependent on a quantization parameter QP. The value of λ is smaller when the quantization parameter is smaller (when the quantization step size is smaller), and larger when the quantization parameter is larger (when the quantization step size is larger). In detail, λ is defined as in the following Expression (2).
                    [                  Math          .                                          ⁢          2                ]                                                                                  λ            Motion                    ⁡                      (            QP            )                          =                  2                                    QP              -              12                        6                                              (        2        )            
In the computation of the coding cost J, the sum of absolute values of motion compensated prediction errors (sum of absolute differences: SAD) may be used instead of SATD.
FIG. 9 is a conceptual diagram depicting a state where a combination of a reference picture index and a motion vector (MV) used for motion compensated prediction of a block to be encoded is determined from images of two previously encoded reference pictures (RefIdx=0, 1). As depicted in FIG. 9, in motion vector search, the coding cost J of a candidate motion vector my included in a motion search area of each of the two reference pictures is computed, and a combination of a reference picture index (RefIdx=1 in the example depicted in FIG. 9) and a motion vector (MV=BMV) that minimizes the coding cost J is obtained.
High Efficiency Video Coding (HEVC) has been studied as a successor standard to H.264/AVC. HEVC defines a new concept, that is, a motion compensated prediction mode called merge mode.
FIG. 10 is an explanatory diagram of the merge mode. The left part of FIG. 10 depicts a prediction block to be encoded (current prediction unit: Cur_PU). Blocks A0, A1, B0, B1, and B2 are encoded blocks adjacent to Cur_PU in a reference picture. The center part of FIG. 10 depicts an encoded block (referred to as collocate PU: Col_PU) of a picture temporally adjacent to Cur_PU. Block C is a block at the center position of Col_PU. Block H is a block positioned on the lower right side of Col_PU.
The right part of FIG. 10 depicts the relationships between the picture to be encoded (Cur_Pic) including Cur_PU and the picture (Col_Pic) including Col_PU and their reference pictures (Cur_Ref and Col_Pef).
A video encoding device that uses the merge mode generates a predictive motion vector using motion vectors of blocks A0, A1, B0, B1, B2, C, and H. For a block (block of MergeFlag=1) to which the merge mode is applied, the video encoding device generates a merge motion information candidate list including up to five candidates, based on motion information of encoded blocks at four spatially adjacent positions (blocks A1, B1, B0, and A0 (block B2 in the case where any of blocks A1, B1, B0, and A0 cannot be used)) and an encoded block of one picture adjacent on the time axis (block H, or block C (in the case where block H cannot be used)). The arrows in the left part of FIG. 10 indicate the order in which the blocks are selected.
The video encoding device sets merge motion information identified by an index (merge index) in the merge motion information candidate list, as the motion information of the block of MergeFlag=1. In other words, for the block of MergeFlag=1, the video encoding device transmits only a merge index, and does not transmit a reference picture index and a differential motion vector. Merge indices are expressed by truncated unary codes. Accordingly, the codes of MergeIdx=0, MergeIdx=1, MergeIdx=2, MergeIdx=3, and MergeIdx=4 are respectively 1 bit (0), 2 bits (10), 3 bits (110), 4 bits (1110), and 4 bits (1111).
The process of deriving the merge motion information candidate list is described in detail in 8.5.2.1 Derivation process for motion vector components and reference indices in Non Patent Literature (NPL) 2. The process of deriving the merge motion information candidate list is also briefly described in 3.4.1.1 to 3.4.1.4 in NPL 1.