In recent video coding techniques, a picture is divided into blocks, pixels in the blocks are predicted, and predicted differences are encoded to achieve a high compression ratio. A prediction mode where pixels are predicted from spatially neighboring pixels in a picture to be encoded is called an intra prediction mode. Meanwhile, a prediction mode where pixels are predicted from a previously-encoded reference picture using a motion compensation technique is called an inter prediction mode.
In the inter prediction mode of a video coding apparatus, a reference region used to predict pixels is represented by two-dimensional coordinate data called a motion vector that includes a horizontal component and a vertical component, and motion vector data and difference pixel data between original pixels and predicted pixels are encoded. To reduce the amount of code, a vector predictor is generated based on a motion vector of a block that is adjacent to a target block to be encoded (may be referred to as an encoding target block), and a difference vector between a motion vector of the target block and the vector predictor is encoded. By assigning a smaller amount of code to a smaller difference vector, it is possible to reduce the amount of code for the motion vector and to improve the coding efficiency.
Meanwhile, in a video decoding apparatus, a vector predictor that is the same as the vector predictor generated in the video coding apparatus is determined for each block, and the motion vector is restored by adding the encoded difference vector and the vector predictor. For this reason, the video coding apparatus and the video decoding apparatus include vector prediction units having substantially the same configuration.
In the video decoding apparatus, blocks are decoded, generally, from the upper left to the lower right in the order of the raster scan technique or the z scan technique. Therefore, only a motion vector of a block that is to the left or above a target block to be decoded at the video decoding apparatus, i.e., a motion vector that is decoded before the target block, can be used for prediction by the motion vector prediction units of the video coding apparatus and the video decoding apparatus.
Meanwhile, in MPEG (Moving Picture Experts Group)-4 AVC/H.264 (hereafter may be simply referred to as H.264), a vector predictor may be determined using a motion vector of a previously encoded/decoded reference picture instead of a motion vector of a target picture to be processed (see, for example, ISO/IEC 14496-10 (MPEG-4 Part 10)/ITU-T Rec. H.264).
Also, a method of determining a vector predictor is disclosed in “WD3: Working Draft 3 of High-Efficiency Video Coding” JCTVC-E603, JCT-VC 5th Meeting, March 2011. High-Efficiency Video Coding (HEVC) is a video coding technology the standardization of which is being jointly discussed by ISO/IEC and ITU-T. HEVC Test Model (HM) software (version 3.0) has been proposed as reference software.
The outline of HEVC is described below. In HEVC, reference picture lists L0 and L1 listing reference pictures are provided. For each block, regions of up to two reference pictures, i.e., motion vectors corresponding to the reference picture lists L0 and L1, can be used for inter prediction.
The reference picture lists L0 and L1 correspond, generally, to directions of display time. The reference picture list L0 lists previous pictures with respect to a target picture to be processed, and the reference picture list L1 lists future pictures. Each entry of the reference picture lists L0 and L1 includes a storage location of pixel data and a picture order count (POC) of the corresponding picture.
POCs are represented by integers, and indicate the order in which pictures are displayed and relative display time of the pictures. Assuming that a picture with a POC “0” is displayed at display time “0”, the display time of a given picture can be obtained by multiplying the POC of the picture by a constant. For example, when “fr” indicates the display cycle (Hz) of frames and “p” indicates the POC of a picture, the display time of the picture may be represented by formula (1) below.Display time=p×(fr/2)   formula (1)
Accordingly, it can be said that the POC indicates display time of a picture in units of a constant.
When a reference picture list includes two or more entries, reference pictures that motion vectors refer to are specified by index numbers (reference indexes) in the reference picture list. When a reference picture list includes only one entry (or one picture), the reference index of a motion vector corresponding to the reference picture list is automatically set at “0”. In this case, there is no need to explicitly specify the reference index.
A motion vector of a block includes an L0/L1 list identifier, a reference index, and vector data (Vx, Vy). A reference picture is identified by the L0/L1 list identifier and the reference index, and a region in the reference picture is identified by the vector data (Vx, Vy). Vx and Vy in the vector data indicate, respectively, differences between the coordinates of a reference region in the horizontal and vertical axes and the coordinates of a target block (or current block) to be processed. For example, Vx and Vy may be represented in units of quarter pixels. The L0/L1 list identifier and the reference index may be collectively called a reference picture identifier, and (0, 0) may be called a 0 vector.
A method of determining a vector predictor in HEVC is described below. A vector predictor is determined for each reference picture identified by the L0/L1 list identifier and the reference index. In determining vector data mvp of a vector predictor for a motion vector referring to a reference picture identified by a list identifier LX and a reference index refidx, up to three sets of vector data are calculated as vector predictor candidates.
Blocks that are spatially and temporally adjacent to a target block are categorized into three groups: blocks to the left of the target block (left group), blocks above the target block (upper group), and blocks temporally adjacent to the target block (temporally-adjacent group). From each of the three groups, up to one vector predictor candidate is selected.
Selected vector predictor candidates are listed in the order of priority of the groups: the temporally-adjacent group, the left group, and the upper group. This list is placed in an array mvp_cand. If no vector predictor candidate is present in all the groups, a 0 vector is added to the array mvp_cand.
A predictor candidate index mvp_idx is used to identify one of the vector predictor candidates in the list which is to be used as the vector predictor. That is, the vector data of a vector predictor candidate located at the “mvp_idx”-th position in the array mvp_cand are used as the vector data mvp of the vector predictor.
When my indicates a motion vector of an encoding target block which refers to a reference picture identified by the list identifier LX and the reference index refidx, the video coding apparatus searches the array mvp_cand to find a vector predictor candidate closest to the motion vector mv, and sets the index of the found vector predictor candidate as the predictor candidate index mvp_idx. Also, the video coding apparatus calculates a difference vector mvd using formula (2) below and encodes refidx, mvd, and mvp_idex as motion vector information for the list LX.mvd=my−mvp   formula (2)
The video decoding apparatus decodes refidx, mvd, and mvp_idex, determines mvp_cand based on refidx, and uses the vector predictor candidate located at the “mvp_idx”-th position in mvp_cand as the vector predictor mvp. The video decoding apparatus restores the motion vector my of the target block based on formula (3) below.my=mvd+mvp   formula (3)
Next, blocks spatially adjacent to a target block are described. FIG. 1 is a drawing illustrating blocks spatially adjacent to a target block. With reference to FIG. 1, exemplary processes of selecting vector predictor candidates from blocks to the left of the target block and blocks above the target block are described.
First, an exemplary process of selecting a vector predictor candidate from the blocks to the left of the target block is described. Blocks I and H to the left of the target block are searched in this order until a motion vector 1 with the list identifier LX and the reference index refidx is found. If the motion vector 1 with the list identifier LX and the reference index refidx is found, the motion vector 1 is selected.
If the motion vector 1 is not found, a motion vector 2, which refers to a reference picture that is in a reference picture list LY and is the same as the reference picture indicated by the reference index refidx of the reference picture list LX, is searched for. If the motion vector 2 is found, the motion vector 2 is selected.
If the motion vector 2 is not found, a motion vector 3 for inter prediction is searched for. If the motion vector 3 is found, the motion vector 3 is selected. If the motion vector selected in this process does not refer to a reference picture that is the same as the reference picture indicated by the reference index refidx of the reference picture list LX, a scaling process described later is performed.
Next, an exemplary process of selecting a vector predictor candidate from the blocks above the target block is described. Blocks E, D, and A above the target block are searched in this order until a motion vector 1 with the list identifier LX and the reference index refidx is found. If the motion vector 1 with the list identifier LX and the reference index refidx is found, the motion vector 1 is selected.
If the motion vector 1 is not found, a motion vector 2, which refers to a reference picture that is in a reference picture list LY and is the same as the reference picture indicated by the reference index refidx of the reference picture list LX, is searched for. If the motion vector 2 is found, the motion vector 2 is selected.
If the motion vector 2 is not found, a motion vector 3 for inter prediction is searched for. If the motion vector 3 is found, the motion vector 3 is selected. If the motion vector selected in this process does not refer to a reference picture that is the same as the reference picture indicated by the reference index refidx of the reference picture list LX, a scaling process described later is performed.
Next, blocks temporally adjacent to a target block are described. FIG. 2 is a drawing used to describe a process of selecting a vector predictor candidate from blocks temporally adjacent to a target block.
First, a temporally-adjacent reference picture 20, which includes a temporally-adjacent block and is called a collocated picture (ColPic), is selected. The ColPic 20 is a reference picture with reference index “0” in the reference picture list L0 or L1. Normally, a ColPic is a reference picture with reference index “0” in the reference picture list L1.
An mvCol 22, which is a motion vector of a block (Col block) 21 located in the ColPic 20 at the same position as a target block 11, is scaled by a scaling method described below to generate a vector predictor candidate.
An exemplary method of scaling a motion vector is described below. Here, it is assumed that an input motion vector is represented by mvc=(mvcx, mvcy), an output vector (vector predictor candidate) is represented by mvc′=(mvcx′, mvcy′), and mvc is mvCol.
Also, ColRefPic 23 indicates a picture that mvc refers to, ColPicPoc indicates the POC of the ColPic 20 including mvc, ColRefPoc indicates the POC of the ColRefPic 23, CurrPoc indicates the POC of a current target picture 10, and CurrRefPoc indicates the POC of a picture 25 identified by RefPicList_LX and Refldx.
When the motion vector to be scaled is a motion vector of a spatially-adjacent block, ColPicPoc equals CurrPoc. When the motion vector to be scaled is a motion vector of a temporally-adjacent block, ColPicPoc equals the POC of ColPic.
As indicated by formulas (4) and (5) below, mvc is scaled based on the ratio between time intervals of pictures.mvcx′=mvcx×(CurrPoc−CurrRefPoc)/(ColPicPoc−ColRefPoc)   formula (4)mvcy′=mvcy×(CurrPoc−CurrRefPoc)/(ColPicPoc−ColRefPoc)   formula (5)
However, since division requires a large amount of calculation, mvc′ may be approximated, for example, by multiplication and shift using formulas below.DiffPocD=ColPicPoc−ColRefPoc   formula (6)DiffPocB=CurrPoc−CurrRefPoc   formula (7)TDB=Clip3(−128, 127, DiffPocB)   formula (8)TDD=Clip3(−128, 127, DiffPocD)   formula (9)iX=(0x4000+abs(TDD/2))/TDD   formula (10)Scale=Clip3(−1024, 1023, (TDB×iX+32)>>6)    formula (11)
abs ( ): a function that returns an absolute value
Clip3(x, y, z): a function that returns a median of x, y, and z
>>: right arithmetic shift
“Scale” obtained by formula (11) is used as a scaling factor. In this example, Scale=256 indicates a coefficient of “1”, i.e., my is not scaled. The scaling factor has an 8-bit precision after the decimal point. Accordingly, when multiplied by the scaling factor, the precision after the decimal point of a motion vector is increased by 8 bits.
Based on the scaling factor Scale, a scaling operation is performed using the formulas below.mvcx′=(Scale×mvcx+128)>>8   formula (12)mvcy′=(Scale×mvcy+128)>>8   formula (13)
In formulas (12) and (13), N bits after the decimal point are rounded off to the nearest integer by adding 2N−1 to a value multiplied by the scaling factor and shifting the result of addition to the right by N bits. A similar scaling process is disclosed in ISO/IEC 14496-10 (MPEG-4 Part 10)/ITU-T Rec. H.264. The obtained vector mvc′ is used as a vector predictor candidate.