Inter prediction is a technique to exploit the temporal redundancy in the video sequence. Furthermore, motion estimation/motion compensation is used to take into consideration of object movement between frames. The motion estimation process identifies one (uni-prediction) or two (bi-prediction) best reference blocks in one or two reference pictures. The best reference block is located according to a corresponding motion vector (MV). The coding system usually derives the difference (i.e., prediction residuals) between an underlying block and a corresponding reference block and encodes the residuals. When bi-prediction is used, the two reference blocks are combined, such as averaging, to form a predictor for the underlying block.
Inter prediction is often used with Intra prediction in various video coding systems, such as the High Efficiency Video Coding (HEVC) standard developed in recent years. In the High Efficiency Video Coding (HEVC) system, the fixed-size macroblock of H.264/AVC is replaced by flexible block structure. The basic unit for compression is termed coding tree unit (CTU). Each CTU may contain one coding unit (CU) or recursively split into four smaller CUs until the predefined minimum CU size is reached. Each CU (also named leaf CU) contains one or multiple prediction units (PUs) and a tree of transform units (TUs).
In general, a CTU consists of one luma coding tree block (CTB) and two corresponding chroma CTBs, a CU consists of one luma coding block (CB) and two corresponding chroma CBs, a PU consists of one luma prediction block (PB) and two corresponding chroma PBs, and a TU consists of one luma transform block (TB) and two corresponding chroma TBs. However, exceptions can occur because the minimum TB size is 4×4 for both luma and chroma (i.e., no 2×2 chroma TB supported for 4:2:0 colour format) and each Intra chroma CB always has only one Intra chroma PB regardless of the number of Intra luma PBs in the corresponding Intra luma CB.
For an Intra CU, the luma CB can be predicted by one or four luma PBs, and each of the two chroma CBs is always predicted by one chroma PB, where each luma PB has one Intra luma prediction mode and the two chroma PBs share one Intra chroma prediction mode. Moreover, for the Intra CU, the TB size cannot be larger than the PB size. In each PB, the Intra prediction is applied to predict samples of each TB inside the PB from neighbouring reconstructed samples of the TB. For each PB, in addition to 33 directional Intra prediction modes, DC and planar modes are also supported to predict flat regions and gradually varying regions, respectively.
For each Inter PU, one of three prediction modes including Inter, Skip, and Merge, can be selected. For each of the three Inter prediction modes, a motion vector competition (MVC) scheme is used to select a motion candidate from a given candidate set that includes spatial and temporal motion candidates. Multiple references for motion estimation allow using the best reference in two possible reconstructed reference picture lists (namely List 0 and List 1). The reference picture list may be simply referred as a list or a List in this disclosure. For the Inter mode (unofficially termed AMVP (Advanced Motion Vector Prediction) mode), Inter prediction indicators (List 0, List 1, or bi-directional prediction), reference indices, motion candidate indices, motion vector differences (MVDs) and prediction residuals are transmitted. As for the Skip mode and the Merge mode, only Merge indices are transmitted, and the current PU inherits the Inter prediction indicator, reference indices, and motion vectors from a neighbouring PU referred by the coded merge index. In the case of a Skip coded CU, the residual signal is also omitted. Quantization, entropy coding, and deblocking filter (DF) are also in the coding loop of HEVC.
FIG. 1 illustrates an exemplary adaptive Inter/Intra video coding system incorporating loop processing. For Inter-prediction, Motion Estimation (ME)/Motion Compensation (MC) 112 is used to provide prediction data based on video data from other picture or pictures. Switch 114 selects Intra Prediction 110 or Inter-prediction data and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion, coding modes, and other information associated with the image area. The side information may also be compressed by entropy coding to reduce required bandwidth. Accordingly, the data associated with the side information are provided to Entropy Encoder 122 as shown in FIG. 1. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in FIG. 1, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Therefore, In-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) and Sample Adaptive Offset (SAO) have been used in the High Efficiency Video Coding (HEVC) standard. The in-loop filter information may have to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, in-loop filter information is provided to Entropy Encoder 122 for incorporation into the bitstream. In FIG. 1, in-loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in FIG. 1 is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9 or H.264.
FIG. 2 illustrates a system block diagram of a corresponding video decoder for the encoder system in FIG. 1. Since the encoder also contains a local decoder for reconstructing the video data, some decoder components are already used in the encoder except for the entropy decoder 210. Furthermore, only motion compensation 220 is required for the decoder side. The switch 146 selects Intra-prediction or Inter-prediction and the selected prediction data are supplied to reconstruction (REC) 128 to be combined with recovered residues. Besides performing entropy decoding on compressed residues, entropy decoding 210 is also responsible for entropy decoding of side information and provides the side information to respective blocks. For example, Intra mode information is provided to Intra-prediction 110, Inter mode information is provided to motion compensation 220, loop filter information is provided to loop filter 130 and residues are provided to inverse quantization 124. The residues are processed by IQ 124, IT 126 and subsequent reconstruction process to reconstruct the video data. Again, reconstructed video data from REC 128 undergo a series of processing including IQ 124 and IT 126 as shown in FIG. 2 and are subject to coding artefacts. The reconstructed video data are further processed by In-loop filter 130 before the reconstructed pictures are stored in the reference picture buffer 134.
As mentioned earlier, motion vector prediction is widely in recent advanced video coding as coding tool to reduce bits required for motion information coding. The motion vector prediction process includes generating a motion vector candidate list and pruning the candidate list to remove redundancy. A brief description of the generation process and pruning process are reviewed as follows.
Competitive Spatial-Temporal Motion Candidate in Inter Prediction
There are three prediction modes for the Inter prediction in HEVC, including the Inter mode, Skip mode and Merge mode. For all the three modes, a motion vector competition (MVC) scheme is applied to increase the coding efficiency of the MV prediction and MV coding. The MVC process generates a list of ordered candidates and selects one motion candidate among a given candidate list. The candidate list contains spatial and temporal motion candidates.
For the Inter mode, an Inter prediction indicator is transmitted to denote list 0 prediction, list 1 prediction, or bi-prediction. Next, one or two reference indices are transmitted to indicate the reference picture(s) when there are multiple reference pictures in a given list. An index is transmitted for each prediction direction to select one motion candidate from the candidate list. FIG. 3 illustrates an example of candidate list for the Inter mode according to HEVC. The candidate list includes two spatial motion candidates and one temporal motion candidate:                1. Left candidate (the first available from A0, A1)        2. Top candidate (the first available from B0, B1, B2)        3. Temporal candidate (the first available from TBR and TCT)        
The left spatial motion candidate is searched from the below left to the left (i.e., A0 and A1) and the first available one is selected as the left candidate. The top spatial motion candidate is searched from the above right to the above left (i.e., B0, B1, and B2) and the first available one is selected as the top candidate. A temporal motion candidate is derived from a block (TBR or TCT) located in a reference picture, which is termed temporal collocated picture. The temporal collocated picture is indicated by transmitting a flag in slice header to specify the reference picture list and a reference index in slice header to indicate the reference picture in the reference list used as the collocated reference picture. After the index is transmitted, one or two corresponding motion vector differences (MVDs) are transmitted, where the MVD corresponds to the difference between a MV being coded and its MV predictor.
For the Skip mode and Merge mode, a Merge index is signalled to indicate the selected candidate in the merging candidate list. No Inter prediction indicator, reference index, or MVD is transmitted. Each PU coded in the Skip or Merge mode reuses the Inter prediction indicator, reference index (or indices), and motion vector(s) of the selected candidate. It is noted that if the selected candidate is a temporal motion candidate, the reference index is always set to 0. As shown in FIG. 3, the merging candidate list for the Skip mode and the Merge mode includes four spatial motion candidates and one temporal motion candidate:                1. Left candidate (A1)        2. Top candidate (B1)        3. Above right candidate (B0)        4. Below left candidate (A0)        5. Above left candidate (B2), used only when any of the above spatial candidate is not available        6. Temporal candidate (the first available from TBR and TCT)        
Redundancy Removal and Additional Motion Candidates
For the Inter mode, Skip mode, and Merge mode, after deriving the spatial motion candidates, a pruning process is performed to check the redundancy among the spatial candidates.
After removing redundant or unavailable candidates, the size of the candidate list could be adjusted dynamically at both the encoder and decoder sides so that the truncated unary binarization can be beneficial for entropy coding of the index. Although the dynamic size of candidate list could improve coding gains, it also introduces a potential parsing problem. Since the temporal motion candidate is included in the candidate list, a mismatch between the candidate list on the encoder side and that on the decoder side may occur when one MV of a previous picture cannot be decoded correctly. This will result in a parsing error of the candidate index. This parsing error may propagate and cause the rest of the current picture improperly parsed or decoded. This parsing error could even affect subsequent Inter pictures that also allow temporal motion candidates. Therefore, a small decoding error of a MV may cause failures of parsing many subsequent pictures.
In HEVC, in order to solve the mentioned parsing problem, a fixed candidate list size is used to decouple the candidate list construction and the parsing of the index. Moreover, in order to compensate the coding performance loss caused by the fixed list size, additional candidates are assigned to the empty positions in the candidate list. In this process, the index is coded in truncated unary codes of a maximum length, where the maximum length is transmitted in slice header for the Skip mode and Merge mode and fixed to 2 for the Inter mode.
For the Inter mode, a zero vector motion candidate is added to fill the empty positions in the AMVP candidate list after the deriving and pruning the candidate list containing the two spatial motion candidates and the one temporal motion candidate. As for the Skip mode and Merge mode, after deriving and pruning the candidate list containing the four spatial motion candidates and the one temporal motion candidate, additional candidates are derived and added to fill the empty positions in the merging candidate list if the number of available candidates is smaller than the fixed candidate list size.
Two types of additional candidates are used to fill the merging candidate list: the combined bi-predictive motion candidate and the zero vector motion candidate. The combined bi-predictive motion candidates are created by combining two original motion candidates according to a predefined order. FIG. 4 illustrates an example of generating a bi-predictive motion candidate 444 by combining two original motion candidates 440 and 442. Candidate list 410 corresponds to an original list containing two candidates: mvL0_A with ref0 and mvL1_B with ref0. Motion vector mvL0_A points from the current block in the current picture 430 to a reference block in a reference picture L0R0 432 in list 0. Motion vector mvL1_B points from the current block in the current picture 430 to a reference block in a reference picture L1R0 434 in list 1. The update candidate list 420 includes this combined bi-predictive motion candidate. After adding the combined bi-predictive motion candidates, zero vector motion candidates can be added to the remaining positions if the merging candidate list still has empty position(s).
For a bi-predictive motion vector, each motion vector points to a reference block. The prediction is formed by averaging the two reference blocks pointed by the two motion vectors.