Motion estimation is an effective Inter-frame coding technique to exploit temporal redundancy in video sequences. Motion-compensated Inter-frame coding has been widely used in various video coding standards, such as H.264, HEVC (High Efficiency Video Coding) and AVS2. The motion estimation adopted in various coding standards is often a block-based technique, where motion information such as coding mode and motion vector is determined for each macroblock, coding unit or similar block configuration. In addition, Intra-coding is also adaptively applied, where the picture is processed without reference to any other picture. The Inter-predicted or Intra-predicted residues are usually further processed by transformation, quantization, and entropy coding to generate a compressed video bitstream. For Inter prediction, one or more previous coded pictures are used to derive Inter prediction for the current picture. The previous coded pictures are referred as reference pictures and are stored in Decoded Picture Buffer (DPB).
For AVS2, various picture types, including I/P/B/F/G/GB and S pictures, have been used. The use of I, P and B pictures is similar to that for H.264 and HEVC, where I refers to an Intra coded picture, P refers to a forward predicted picture based on one reference picture and B refers to a bi-direction predicted picture using one picture before and one picture after the current picture in the display order. In AVS2, F refers to a forward predicted picture based on one or two reference pictures. G and GB refer to an Intra predicted scene picture, where the G picture will be outputted and the GB picture will not be outputted. S refers to the Intra prediction picture or a forward predicted picture based on one reference picture. Furthermore, an S picture only uses a most recently decoded G or GB as the reference picture.
AVS2 also uses block-based coding, where an image, a slice is partitioned into blocks and coding process is applied to each block. Furthermore, the block partition process often starts with a largest coding unit (LCU) and the LCU is partitioned into one or more coding units (CUs) using quadtree partition or binary-tree partition. Other similar image units such as super block (SB) or coding tree block (CTB) are also used. In the sequence header for AVS2, a syntax element indicating the LCU size is included. For AVS2, a syntax element, progressive_sequence is included to indicate whether the images associated with this sequence header are progressive or not. Progressive_sequence equal to 1 represents all pictures in the sequence are frame picture and progressive sequence equal to 0 represents all pictures in the sequence are frame picture or field picture. Similarly, syntax element field_coded_sequence is included in the sequence header, where field_coded_sequence equal to 1 represents all pictures in the sequence are field picture and field_coded_sequence equal to 0 represents all pictures in the sequence are frame picture. If progressive sequence is equal to 1, field_coded_sequence should be 0. Also, syntax element, bitdepth is included in the sequence header to indicate the bit depth of the pixel data of images associated with the sequence header. Furthermore, syntax element chroma_format is included in the sequence header to indicate the chroma format being used for images associated with the sequence header. For example, the chroma format may correspond to 4:0:0, 4:2:0, 4:2:2 or 4:4:4 format.
For Inter prediction modes, the temporal reconstructed reference frames can be used to generate motion compensated predictions. Motion vector (MV) associated with a current PU needs to be signaled so that a reference block can be located. In order to improve coding efficiency of MV coding, the MV is coded predictively using a Motion Vector Predictor (MVP). Therefore, when a PU is coded in using MVP, the motion vector difference (MVD) between a current MV and an MVP is derived and signaled in the video bitstream. At the decoder side, the MV is reconstructed according to: MV=MVP+MVD.
The MVP is determined by the encoder from an MVP candidate list generated from previous coded spatial and temporal neighboring blocks. The decoder maintains a same copy of the MVP candidate list. Furthermore, the encoder selects an MVP as a predictor for a current MV based on characteristics of neighboring PUs and the same selection process is performed at the decoder side. Therefore, there is no need to signal the MVP selection information (e.g. MVP index). When the MVP is used to predict the MV of a current PU, the MV of the current CU may be the same as that of the MVP. In this case, the MVD is zero and there is no need to transmit the MVD. The motion information of the current PU can inherit the motion information of the MVP. Therefore, the motion information (e.g. motion vector, prediction direction and reference picture number) for the current MV can be inherited from the motion information of the MVP. The MVP can be determined at the decoder side without the need of an MVP index, which makes the MVP a very efficient coding tool for the MV coding.
The residues for a current PU are derived as the differences between the current PU and a reference block at the encoder side. The residues can be signaled by the encoder. When the MV is the same as the MVP, only the residues are signaled without the need to signal MVD. This coding mode is referred as a “Direct” mode since the motion information of the MVP can be used directly as the reconstruct the current MV. Furthermore, the differences between the current PU and the reference PU can be very small or zero (i.e., the reference PU being close or the same as the current PU). In this case, there is no need to signal the residues and the case is referred as “Skip” mode. In other words, a Skip-mode coded block can be reconstructed using the derived MVP information at the decoder side without the need for signaling any residue or the MVD.
According to AVS2, a MVP candidate set is derived for non-Direct/Direct CUs and non-Skip/Skip mode CUs. For non-Direct and non-Skip mode CUs, the MVP candidates are derived from MVs of spatial neighboring blocks at L (141), U (130), UR (132) and UL (140) as shown in FIG. 1, where block 112 in the current picture 110 corresponds to the current PU. The MVP candidate set for Direct CUs and Skip CUs are derived from MVs of spatial neighboring blocks at L (141), U (130), UR (132), UL (140), L1 (142) and U1 (131) as shown in FIG. 1. Furthermore, the MVP candidate set for Direct CUs and Skip CUs also include a temporal MVP candidate from the MV of a top-left block Col-T (150) in a collocated PU 122 in reference picture 120.
The derivation process of MVP for non-Direct CUs and non-Skip CUs is described as follows. The MVP derivation is based on the MVs associated with the four spatial neighboring blocks (i.e., L (141), U (130), UR (132) and UL (140) in FIG. 1). The MVs at L (141), U (130), UR (132) and UL (140) are referred as MVA, MVB, MVC and MVD respectively. The block distance (referred as BlockDistance) between a current block in a current picture and the reference block in a reference picture pointed by the MV of the current block is derived for the current block and all candidate blocks (i.e., blocks L, U, UL and UR). As mentioned above, in AVS2, a syntax element, field_coded_sequence is used to indicate whether the current sequence is a field_coded_sequence. If field_coded_sequence has a value of 1, the current sequence is a field_coded_sequence; otherwise, the current sequence is not. Two variables, delta1 and delta2 are used in the MVP derivation, where the values of delta1 and delta2 are determined from field_coded_sequence. If field_coded_sequence has a value of 0, both delta1 and delta2 are zero. Otherwise, delta1 and delta2 may have values belonging to {-2, 0, 2} depending on whether mvX (X corresponding to A, B or C) is associated with a block in a top field or a bottom field and whether mvX points to a top field or a bottom field.
The MVs of neighboring blocks have to be scaled properly according to the block distances of the current block (i.e., BlockDistanceE) and the block distance of a neighboring block (i.e., BlockDistanceX):MVA_x=Clip3(−32768,32767,Sign(mvA_x)×((Abs(mvA_x)×BlockDistanceE×(16384/BlockDistanceA)×8192)>>14))MVA_y=Clip3(−32768,32767,Sign(mvA_y+delta1)×((Abs(mvA_y+delta1)×BlockDistanceE×(16384/BlockDistanceA)+8192)>>14)−delta2)MVB_x=Clip3(−32768,32767,Sign(mvB_x)×((Abs(mvB_x)×BlockDistanceE×(16384/BlockDistanceB)×8192)>>14))MVB_y=Clip3(−32768,32767,Sign(mvB_y+delta1)×((Abs(mvB_y+delta1)×BlockDistanceE×(16384/BlockDistanceB)+8192)>>14)−delta2)MVC_x=Clip3(−32768,32767,Sign(mvC_x)×((Abs(mvC_x)×BlockDistanceE×(16384/BlockDistanceB)×8192)>>14))MVC_y=Clip3(−32768,32767,Sign(mvC_y+delta1)×((Abs(mvC_y+delta1)×BlockDistanceE×(16384/BlockDistanceC)+8192)>>14)−delta2)
In the above equation, Clips(i,j,x) is a clipping function that clips variable x to the range between I and j; Abs(x) is an absolute value function; and Sign(x) is a sign function to output 1 for non-negative x and output −1 for negative x. As shown above, the block distance plays an important role in MVP scaling. The block distance is derived as follows:
If reference picture is before the current picture:                BlockDistance=[(DistanceIndexCur−DistanceIndexRef)+512]% 512        DistanceIndexCur is the DistanceIndex of current picture, and        DistanceIndexRef is the DistanceIndex of reference picture.        
If reference picture is after the current picture:                BlockDistance=[(DistanceIndexRef−DistanceIndexCur)+512]% 512        
DistanceIndex is derived from syntax of bitstream as follows:                If field_coded_sequence=0, DistanceIndex=POI×2+1,        If field_coded_sequence=1, DistanceIndex=POI×2, and        POI (picture order index) is picture order index and derived from syntax of bitstream.        
When the reference picture of a neighboring block is G or GB picture, the BlockDistance of neighbor block is not defined in the AVS2 if the current block also points to a G or GB reference picture. The neighbor blocks include L (141), U (130), UR (132) and UL (140) as shown in FIG. 2. According to AVS2, if the SceneReferenceEnableFlag is equal to 1 for the current picture (i.e., the current picture being allowed to use G or GB reference picture), and only one of the current MV and the mvX points to the reference picture located at (RefPicNum−1) of the reference picture buffer (i.e., the location for storing a G or GB picture), then BlockDistanceX is defined as 1.
When the BlockDistanceX is undefined, it may cause a serious issue in the coding process. As mentioned above, the BlockDistanceX is used to scale the MVP to take into account the difference in block distance associated with the MV of a neighboring block and the block distance of the current MV. In order to overcome the undefined BlockDistanceX issue, the present invention discloses a process to ensure the BlockDistanceX associated with the MV of a neighboring block pointing to a G or GB picture is properly defined.