High-Efficiency Video Coding (HEVC) is a new international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC). HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture. The basic unit for compression, termed as coding unit (CU), is a 2N×2N square block, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs).
To achieve the best coding efficiency of hybrid coding architecture in HEVC, there are two kinds of prediction modes for each PU, which are Intra prediction and Inter prediction. For Intra prediction modes, the spatial neighbouring reconstructed pixels can be used to generate the directional predictions. There are up to 35 directions in HEVC. For Inter prediction modes, the temporal reconstructed reference frames can be used to generate motion compensated predictions. There are three different modes, including Skip, Merge and Inter Advanced Motion Vector Prediction (AMVP) modes.
When a PU is coded in Inter AMVP mode, motion-compensated prediction is performed with transmitted motion vector differences (MVDs) that can be used together with Motion Vector Predictors (MVPs) for deriving motion vectors (MVs). To decide MVP in Inter AMVP mode, the advanced motion vector prediction (AMVP) scheme is used to select a motion vector predictor among an AMVP candidate set including two spatial MVPs and one temporal MVP. Therefore, in AMVP mode, an MVP index for MVP and the corresponding MVDs are required to be encoded and transmitted. In addition, the Inter prediction direction to specify the prediction directions among bi-prediction, and uni-prediction related to list 0 (L0) and list 1 (L1) along with the reference frame index for each list should also be encoded and transmitted.
When a PU is coded in either Skip or Merge mode, no motion information is transmitted except for the Merge index of the selected candidate. That is because the Skip and Merge modes utilize motion inference methods (i.e., MV=MVP+MVD, where MVD is zero) to obtain the motion information from spatial neighbouring blocks (spatial candidates) or a temporal block (temporal candidate) located in a co-located picture where the co-located picture is the first reference picture in list 0 or list 1, which is signalled in the slice header. In the case of a Skip PU, the residual signal is also omitted. To decide the Merge index for the Skip and Merge modes, the Merge scheme is used to select a motion vector predictor among a Merge candidate set containing four spatial MVPs and one temporal MVP.