High-Efficiency Video Coding (HEVC) is a new international video coding standard that is being developed by the Joint Collaborative Team on Video Coding (JCT-VC). HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture. The basic unit for compression, termed Coding Unit (CU), is a 2N×2N square block, and each CU can be recursively split into four smaller CUs until a predefined minimum size is reached. Each CU contains one or multiple Prediction Units (PUs). The PU sizes can be 2N×2N, 2N×N, 2N×nU, 2N×nD, N×2N, nL×2N, nR×2N, or N×N, where 2N×N, 2N×nU, 2N×nD and N×2N, nL×2N, nR×2N correspond to horizontal and vertical partition of a 2N×2N PU with symmetric or asymmetric PU size division respectively.
To further increase the coding efficiency of motion vector coding in HEVC, the motion vector competition (MVC) based scheme is applied to select one motion vector predictor (MVP) among a given MVP candidate set which includes spatial and temporal MVPs. There are three inter-prediction modes including Inter, Skip, and Merge in the HEVC test model version 3.0 (HM-3.0). The Inter mode performs motion-compensated prediction with transmitted Motion Vector Differences (MVDs) that can be used together with MVPs for deriving motion vectors (MVs). The Skip and Merge modes utilize motion inference methods (MV=MVP+MVD where MVD is zero) to obtain the motion information from spatial neighboring blocks (spatial candidates) or temporal blocks (temporal candidates) located in a co-located picture. The co-located picture is the first reference picture in list 0 or list 1, which is signaled in the slice header.
When a PU is coded in either Skip or Merge mode, no motion information is transmitted except for the index of the selected candidate. In the case of a Skip PU, the residual signal is also omitted. For the Inter mode in HM-3.0, the Advanced Motion Vector Prediction (AMVP) scheme is used to select a motion vector predictor among an AMVP candidate set including two spatial MVPs and one temporal MVP. In this disclosure MVP may refer to motion vector predictor or motion vector prediction. As for the Merge and Skip mode in HM-3.0, the Merge scheme is used to select a motion vector predictor among a Merge candidate set containing four spatial MVPs and one temporal MVP.
For the Inter mode, the reference picture index is explicitly transmitted to the decoder. The MVP is then selected among the candidate set for a given reference picture index. FIG. 1 illustrates the MVP candidate set for the Inter mode according to HM-3.0, where the MVP candidate set includes two spatial MVPs and one temporal MVP:
1. Left predictor (the first available MV from A0 and A1),
2. Top predictor (the first available MV from B0, B1, and Bn+1), and
3. Temporal predictor (the first available MV from TBR and TCTR).
A temporal predictor is derived from a block (TBR or TCTR) in a co-located picture, where the co-located picture is the first reference picture in list 0 or list 1. The block associated with the temporal MVP may have two MVs: one MV from list 0 and one MV from list 1. The temporal MVP is derived from the MV from list 0 or list 1 according to the following rule:
1. The MV that crosses the current picture is chosen first, and
2. If both MVs cross the current picture or both do not cross, the MV with the same reference list as the current list will be chosen.
In HM-3.0, if a particular block is encoded in the Merge mode, an MVP index is signaled to indicate which MVP among the MVP candidate set is used for this block to be merged. To follow the essence of motion information sharing, each merged PU reuses the MV, prediction direction, and reference picture index of the selected candidate. It is noted that if the selected MVP is a temporal MVP, the reference picture index is always set to the first reference picture. FIG. 2 illustrates the MVP candidate set for the Merge mode according to HM-3.0, where the MVP candidate set includes four spatial MVPs and one temporal MVP:                1. Left predictor (Am),        2. Top predictor (Bn),        3. Temporal predictor (the first available MV from TBR or TCTR),        4. Above-right predictor (B0), and        5. Below-left predictor (A0).        
In HM-3.0, a process is utilized in both Inter and Merge modes to avoid an empty candidate set. The process adds a candidate with a zero MV to the candidate set when no candidate can be inferred in the Inter, Skip or Merge mode.
Based on the rate-distortion optimization (RDO) decision, the encoder selects one final MVP for Inter, Skip, or Merge modes from the given MVP list and transmits the index of the selected MVP to the decoder after removing redundant MVPs in the list. However, because the temporal MVP is included in the MVP list, any transmission error may cause parsing errors at the decoder side and the error may propagate. When an MV of a previous picture is decoded incorrectly, a mismatch between the MVP list at the encoder side and the MVP list at the decoder side may occur. Therefore, subsequent MV decoding may also be impacted and the condition may persist for multiple subsequent pictures.
In HM-4.0, in order to solve the parsing problem related to Merge/AMVP in HM-3.0, fixed MVP list size is used to decouple MVP list construction and MVP index parsing. Furthermore, in order to compensate the coding performance loss caused by the fixed MVP list size, additional MVPs are assigned to the empty positions in the MVP list. In this process, Merge index is coded using truncated unary codes of fixed length equal to 5 or less, and AMVP index is coded using fixed length equal to 2 or less.
Another change in HM-4.0 is the unification of MVP positions. Both Merge and Skip use the same positions shown in FIG. 3. For Merge mode in HM-4.0, up to four spatial MVPs are derived from A0, A1, B0, and B1, and one temporal MVP is derived from TBR or TCTR. For the temporal MVP, TBR is used first. If TBR is not available, TCTR is used instead. If any of the four spatial MVPs is not available, the block position B2 is then used to derive MVP as a replacement. After the derivation process of the four spatial MVPs and one temporal MVP, the process of removing redundant MVPs is applied. If the number of available MVPs is smaller than five after redundant MVP removal, three types of additional MVPs are derived and are added to the MVP list.
In the derivation for the spatial and temporal MVPs, the MVP can be derived with the MV pointing to the same reference picture as the target reference picture. Alternatively, the MVP can be derived from a candidate MV pointing to a different reference picture. FIG. 4 illustrates an example of deriving a spatial MVP based on various types of motion vectors associated with spatial neighboring candidate blocks, where the candidate blocks comprises spatial neighboring blocks A0, A1, B0, B1 and B2, and temporal co-located blocks TBR or TCTR. The circled numbers refer to the search order for determining an MVP from respective candidates. The highest priority of search corresponds to an MV pointing to the target reference picture within the given reference list. The next highest priority of search corresponds to an MV pointing to the target reference picture within the other reference list. The third and fourth search priorities correspond to other reference picture within the given and other reference lists respectively. In the particular example of FIG. 4, the availability of motion vectors 1 and 2 is checked together, and the availability of motion vectors 3 and 4 is checked together. The availability of motion vectors 1 and 2 is checked from candidate blocks A0 through A1 and then from B0 through B2. If none of the MVs exists, the search checks the availability of motion vector 3 and 4 through all the blocks. When the MVP is derived from an MV pointing to a different reference picture or the MV is for a co-located picture, the MV may have to be scaled to take into consideration of different picture distances. The exemplary search patterns for MVP derivation as shown in FIG. 4 shall not be construed as limitations to the present invention as described in this application. For example, the availability of motion vectors 1 through 4 for each block can be checked together. In another example, motion vector 1 can be checked first in the order from A0 to A1 and then from B0 to B2. If none of the MVs exists, the search moves to check the availability of motion vector 2 from A0 to A1 and then from B0 to B2. The process will continue for motion vector 3 and motion vector 4 if the spatial MVP has not been derived.
In the derivation process for the spatial and temporal MVPs, the division operation is required to scale the motion vector. The scaling factor is calculated based on the picture distance ratio. For example, the MVP may be derived based on the MV of a co-located block. The picture distance scaling factor, DistScaleFactor is computed according to:
                              DistScaleFactor          =                                                    POC                curr                            -                              POC                ref                                                                    POC                temp                            -                              POC                temp_ref                                                    ,                            (        1        )            where POCcurr and POCref represent the picture order counts (POCs) of the current picture and the POC of the target reference picture respectively, and POCtemp and POCtemp_ref represent the POC of the co-located picture and the POC of the reference picture pointed to by the MV of the co-located block respectively. While the MV of a co-located block is used to illustrate the derivation of picture distance scaling factor, the MV of a spatial neighboring block may also be used to derive the MVP and the corresponding derivation of picture distance scaling factor can be shown similarly.
In an implementation according to HM-4.0, the POC distance between current picture and the target reference picture and the POC distance between the co-located picture and the reference picture pointed to by the MV of the co-located block are first constrained within a given range, i.e.:DiffPOCcurr=clip(−128,127,POCcurr−POCref),DiffPOCtemp=clip(−128,127,POCtemp−POCtemp_ref).
The scaling factor is then calculated according to the following equations:
                                              ⁢                              X            =                                                            2                  14                                +                                                                                              DiffPOC                      temp                                        2                                                                                                DiffPOC                temp                                              ,          and                                    (        2        )                                DistScaleFactor        =                  clip          ⁡                      (                                          -                1024                            ,              1023              ,                                                (                                                                                    DiffPOC                        curr                                            ×                      X                                        +                    32                                    )                                >>                6                                      )                                              (        3        )            
The scaled MVP is derived by multiplying the MV by the distance scaling factor, i.e.:ScaledMV=sign(DistScaleFactor×MV)×((|DistScaleFactor×MV+127)>>8)  (4)
The picture scaling factor in the form of equation (1) will require division operation, which is more complicated for hardware implement or consumes more CPU time in software based implementation. The essence of the implementation based on HM4.0 is to pre-multiply the distance ratio by a multiplication factor (214 in this equation (2)) so that the distance ratio becomes an integer. In equation (2), an offset term, |DiffPOCtemp/2| is added to 214 to take care of data conversion with rounding. Similarly, offsets 32 and 127 are added in equations (2) and (3) for data conversion. The multiplication factor can be compensated by simple right-shifting operation. Therefore, the implementation associated with equations (1) through (4) is a preferred approach since the computation of the scaled MV, ScaledMV does not require any division operation.
In HM-4.0, the scaling factor (DistScaleFactor) is clipped to the range [−1024, 1023], as shown in equation (3). The scaling factor will be right shifted by eight bits as shown in equation (4), which implies that the effective scaling range is limited to [−4, 4). A reference picture selection method for a low-delay coding system is disclosed by Li et al. to improve coding efficiency (“Encoding optimization to improve coding efficiency for low delay cases”, by Li et al., Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 6th Meeting: Torino, IT, 14-22 Jul. 2011, Document: JCTVC-F701). According to the method disclosed by Li, et al., one nearest picture and three high quality pictures (pictures with low QPs) are used as reference pictures for low delay cases. FIG. 5 illustrates an exemplary reference frame configuration for the low-delay system, Picture 10 is the current picture and picture 9 is the nearest picture. Pictures 0, 4 and 8 are the three high quality reference pictures. Block 525 corresponds to a current block and block 515 corresponds to a neighboring block. The neighboring block 515 has an associated MV 510 that is used to derive the MVP for the current MV 520 of the current block 525. The picture distance associated with the current MV is 6 while the picture distance associated with the candidate MV is 1. Therefore, the picture scaling factor for this example is 6, which exceeds the supported scaling factor range. Therefore, the effective scaling range [−4, 4) is not sufficient for some applications.
Accordingly, it is desirable to develop a scheme to increase the effective scaling ratio for MV scaling. A system incorporating the increased scaling factor may achieve improved performance.