A recent video coding technique achieves high compression rate by dividing an image into plural blocks, predicting pixels included in the blocks and encoding the predicted differentials. A prediction mode in which prediction pixels are generated from pixels in a picture to be encoded is referred to as an intra prediction, and a prediction mode in which prediction pixels are generated from a reference image called motion compensation which was encoded in the past is referred to as an inter prediction.
In the video coding apparatus, to perform the inter prediction, a region which is referred to as prediction pixels is expressed by a motion vector which is two-dimensional data with a horizontal component and a vertical component, and the motion vector and predicted differential data of the pixels are coded. In order to reduce a code amount of the motion vector, a prediction vector is generated based on the motion vector of a block which is adjacent to a block to be coded, and a differential vector between the motion vector and the prediction vector is coded. Because an allocation is performed such that the smaller the differential vector is, the smaller the code amount is allocated, it is possible to reduce the code amount of the motion vector and improve coding efficiency.
Further, in general, there are many cases where a motion of a certain block is completely the same as those of its adjacent blocks. For this reason, it is also possible to reduce the code amount of the motion vector by using the motion vector of the adjacent block as the prediction vector, regarding the differential vector as 0 to inherit it as it is, and encoding index information which indicates the adjacent block whose motion vector is inherited.
The video decoding apparatus determines the same prediction vector as the video coding apparatus for the respective blocks, and adds the coded differential vector to the prediction vector to decode the motion vector. For this reason, the video coding apparatus and the video decoding apparatus have the same motion vector predicting part.
In the video decoding apparatus, in general, the respective blocks are decoded in a raster scan or a z-scan from the upper left to the lower right of the image. For this reason, the motion vectors which may be utilized for prediction by the motion vector predicting parts of the video coding apparatus and the video decoding apparatus are motion vectors of the blocks which are adjacent to a process block at left and upper sides of the process block and have already been decoded when the video decoding apparatus decodes the process block.
Further, according to MPEG (Moving Picture Experts Group)-4 AVC/H.264 (also referred to as H.264, hereinafter), there is a case where the prediction vector is determined using the motion vector of the reference picture which was subject to the coding process and the decoding process in past instead of the picture to be processed.
Known techniques for determining the prediction vector include HEVC (High Efficiency Video Coding) which ISO/IEC and ITU-T are aiming at standardization in cooperation. Further, HM Software (version 4.0) is known as reference software.
In the following, an overview of HEVC is described. According HEVC, there are two lists L0 and L1 as a list of referable pictures (also referred to as reference picture lists). For the respective blocks, with the motion vectors corresponding to L0 and L1, a maximum of two reference picture regions can be used for the inter prediction.
In general, L0 and L1 correspond to directions of display time. L0 is a reference list of the past pictures with respect to the picture to be processed. L1 is a reference list of future pictures. The respective entries of the reference picture list have information including stored locations of the corresponding image data and display time information of POC (Picture Order Count) values of the corresponding pictures.
POC is represented by an integer value which indicates display order of the respective pictures and relative display times. Assuming the display time of the picture whose POC value becomes 0 is 0, the display time of a certain picture can be expressed by multiples of the POC value of the picture by a constant.
For example, when a display frequency of a frame is fr (Hz), the display time of the picture whose POC value is p is given by an equation (1). Thus, POC can be regarded as a display time using a certain constant as a unit.Display time=p×(fr/2)  equation (1)
If there are two or more entries of a reference picture list, the respective motion vectors specify which reference picture is to be referred to by an index number (also referred to as reference index) in the reference picture list. In particular, if the number of the entries of a reference picture list is only one, the reference index of the motion vector corresponding to the list is automatically 0th, and thus it is not necessary to explicitly specify the reference index.
In other word, the motion vector of the block includes an L0/L1 list identifier, the reference index, and vector data (Vx, Vy). The reference picture is specified by the L0/L1 list identifier and the reference index. The region in the reference picture is specified by the (Vx, Vy). Vx and Vy are the difference between coordinates of the reference region and the coordinates of the process block (also referred to as the current block) in horizontal and vertical directions, respectively. For example, Vx and Vy are expressed in units of one-fourth pixels. The L0/L1 list identifier and the reference index are referred to as a reference picture identifier, (Vx, Vy) is referred to as vector data, and (0, 0) is referred to as a zero vector.
A merge mode according to HEVC is described. At first, a way of determining the merge mode according to HEVC is described. According to HEVC, a way of determining the prediction vector has two modes which are referred to as a merge mode and a MVP mode, respectively. In the following, the merge mode in particular is described.
According to the merge mode, a set of prediction information which blocks adjacent to the process block in a spatial direction or a time direction have, is used as it is. The set of prediction information includes a prediction flag which indicates whether L0 is valid and whether L1 is valid, the respective indexes of L0 and L1, and the motion vectors of L0 and L1.
If the prediction flag of L0 is valid and the prediction flag of L1 is invalid, it means uni-prediction of L0. If the prediction flag of L0 is invalid and the prediction flag of L1 is valid, it means uni-prediction of L1. If the prediction flag of L0 and the prediction flag of L1 are valid, it means bi-prediction.
Further, if the prediction flag of L0 and the prediction flag of L1 are invalid, it means the block of the intra prediction. Alternatively, instead of using the prediction flag, if the reference picture identifier is the reference index out of the range of the reference picture list, it may express an invalid status, and if the reference picture identifier is the reference index within the range of the reference picture list, it may express a valid status.
A candidate list of the prediction information (also referred to as a prediction information candidate list, hereinafter) is generated. The index in the candidate list designates which prediction information is to be used. Thus, the motion compensation of the process blocks can be performed using the same prediction information as that of the adjacent blocks. Therefore, without coding the motion vector, etc., only by coding the index in the list, the prediction information used for the process block can be sent to the decoding apparatus, thereby reducing the code amount. A flag called merge flag indicates whether the merge mode is valid, and index information called a merge index indicates the index in the prediction information candidate list.
FIG. 1 is a diagram for illustrating an example of a positional relationship between the process block and adjacent blocks. The adjacent blocks are adjacent to the process block in a spatial direction or a time direction. In the example illustrated in FIG. 1, the blocks A0, A1, B0, B1 and B2 are included in the same picture as the process block and are adjacent to the process block in a spatial direction. The block Col is included in the picture which was previously processed and is adjacent to the process block in a spatial direction.
According to HM4, among these adjacent blocks, the candidates are listed up to five as the prediction information candidate list. If there is the intra prediction block among the adjacent blocks, the intra prediction block is not included in the prediction information candidate list. Further, if there are plural prediction information items whose reference picture identifiers and vector information, etc., are all the same, these duplicated prediction information items are deleted because of redundancy.
The deletion of the redundant candidates causes the change of the number of the candidates, leading to the change in a way of allocating the codes. FIG. 2 is a diagram for illustrating an example of allocating the codes according to the number of the candidates. As illustrated in FIG. 2, if the number of the candidates is reduced from “5” to “3”, the allocated codes and bit numbers are changed.
At that time, it is assumed that there is an error in a predetermined picture because of the data break, etc., and the vector values are not correctly decoded. In this case, even a picture, which refers to the block of the predetermined picture as the Col picture which is adjacent in a time direction, cannot have the adjacent vector values correctly decoded.
Thus, a mismatch occurs by deleting the redundant candidates in the coder and the decoder, as the number of the candidates differs between the coder and the decoder. For this reason, even the picture which uses an error occurring picture as the Col picture has a mismatch of the number of the candidates, and the data cannot be correctly decoded from the block for which the index is not correctly decoded. In this way, the error is propagated. Thus, it is desirable to fix the number of the candidates and not to change the way of allocating the codes, or derive only from the coding information of the picture.
However, if the number of the candidates is fixed and the valid number of the candidates of the prediction information candidate list is less than a predetermined number of the candidates, and the codes are allocated using the predetermined number as a maximum number, the code is also allocated to an invalid useless index, which causes the redundancy and reduction of the coding efficiency.
According to the HEVC, in order to the reduce the redundancy even if the number of the candidates is fixed, the candidate is generated from the previously listed prediction information by the following three processes and is added to the prediction information candidate list if the number the candidates of the prediction information candidate list is less than the predetermined number of the candidates.
<Combined Bi-Predictive Merge>
(1) The prediction information candidate for bi-prediction is generated from two prediction information candidates previously listed. FIG. 3 is a diagram for explaining the Combined Bi-predictive Merge.
In the example illustrated in FIG. 3, at first, two prediction information candidates, which are to be a source of generation, are selected from the prediction information candidate list previously listed. Two prediction information candidates are A and B, respectively. If the L0 prediction flag of A is valid and the L1 prediction flag of B is valid, the prediction information candidate is generated as follows.
The reference index of L0 of the generated prediction information candidate is the reference index of L0 of A.
The motion vector mvL0Cand of L0 of the generated prediction information candidate is the motion vector mvL0 [A] of A.
The reference index of L1 of the generated prediction information candidate is the reference index of L1 of B.
The motion vector mvL1Cand of L1 of the generated prediction information candidate is the motion vector mvL1 [B] of L1 of B.
If the generated prediction information candidate is not included in the prediction information candidate list, it is added to the candidate list.
If the predetermined number of the candidates is not reached by the generation of the prediction information candidate, new prediction information candidates are generated in the same manner from L1 of A or L0 of B by exchanging the reference lists. If the predetermined number of the candidates is not reached even by new prediction information candidates, prediction information candidates are generated until the predetermined number of the candidates is reached by selecting different pairs of the prediction information candidates from the prediction information candidate list and repeating the same process.
<Non Scaled Predictive Merge>
(2) The prediction information candidate for bi-prediction is generated from a prediction information candidate previously listed. FIG. 4 is a diagram for explaining the Non Scaled Predictive Merge.
In the example illustrated in FIG. 4, at first, a prediction information candidate, which is to be a source of generation, is selected from the prediction information candidate list previously listed. The selected one is A. If the prediction flag of LX of A with respect to the list X (X=0, 1) is valid, the following process is performed.
It is assumed that LY is the reference list other than LX (i.e., Y=1−X). If a difference T between the process picture including the process block and the reference picture of LX indicated by the reference index of LX of A is equal to a difference T′ between the process picture and the reference picture of LY indicated by the reference index of LX of A, the prediction information candidate is generated as follows.
The reference index of LX of the generated prediction information candidate is the reference index of LX of A.
The motion vector mvLXCand of LX of the generated prediction information candidate is mvLX [A] of LX of A.
The reference index of LY of the generated prediction information candidate is the reference index of LX of A.
The motion vector mvLYCand of LY of the generated prediction information candidate is a minus motion vector (−mvLX [A]) of LX of A.
If the generated prediction information candidate is not included in the prediction information candidate list, it is added to the candidate list.
If the predetermined number of the candidates is not reached even by a generation of the prediction information candidate, prediction information candidates are generated until the predetermined number of the candidates is reached by changing the prediction information candidate as a source of generation and repeating the same process.
<Zero Vector Addition Process>
If the predetermined number of the candidates is not reached even by these two processes described above, zero vectors are added and the reference indexes of L0 and L1 are incremented.
According to the known HEVC, if the number of the merge candidates does not reach the predetermined number of the candidates, these two processes other than (3) zero vector addition process, among the processes to generate new prediction information candidates in order to fill the candidate list, only generate the bi-predictive vectors. For this reason, in the case of the P picture which is limited to the vector of L0, there is a problem that the candidate other than zero vector cannot be added.
[Non-Patent Document 1] ISO/IEC 14496-10 (MPEG-4 Part 10)/ITU-T Rec.H.264
[non-Patent Document 2] Thomas Wiegand, Woo-Jin Han, Benjamin Bross, Jens-Rainer Ohm, Gary J. Sullivan, “WD4: Working Draft 4 of High-Efficiency Video Coding” JCTVC-F803, JCT-VC 6th Meeting, July, 2011