In High Efficiency Video Coding (HEVC), a merge mode for Inter-picture prediction is introduced. A merge candidate list of candidate motion parameters from neighboring blocks is constructed. Then, an index is signaled which identifies the candidates to be used. Merge mode also allows for temporal prediction by including a candidate obtained from previously coded pictures in the list. In HEVC, the merge candidates list is constructed based on the following candidates: up to four spatial merge candidates that are derived from five spatial neighboring blocks, one temporal merge candidate derived from two temporal co-located blocks, and additional merge candidates including combined bi-predictive candidates and zero motion vector candidates.
In HEVC, a skip mode is used to indicate for a block that the motion data is inferred instead of explicitly signaled and that the prediction residual is zero, i.e., no transform coefficients are transmitted. In HEVC, at the beginning of each CU in an inter-picture prediction slice, a “skip_flag” is signaled that implies the following: the CU only contains one PU (2N×2N); merge mode is used to derive the motion data; and no residual data is present in the bitstream.
In Joint Exploration Model 7 (JEM 7), which is the test model software studied by Joint Video Exploration Team (JVET), some new merge candidates were introduced. The sub-CU modes are enabled as additional merge candidates and there is no additional syntax element required to signal the modes. Two additional merge candidates are added to merge candidates list of each CU to represent the alternative-temporal motion vector prediction (ATMVP) mode and spatial-temporal motion vector prediction (STMVP) mode. Up to seven merge candidates are used, if the sequence parameter set indicates that ATMVP and STMVP are enabled. The encoding logic of the additional merge candidates is the same as for the merge candidates in the HEVC, which means, for each CU in a P or B slice, two more RD checks are needed for the two additional merge candidates. In JEM, the order of the inserted merge candidates is A, B, C, D, ATMVP, STMVP, E (when the merge candidates in the list are less than 6), temporal motion vector prediction (TMVP), combined bi-predictive candidates and zero motion vector candidates.
In the JEM, all bins of merge index are context coded by context-adaptive binary arithmetic coding (CABAC). While in HEVC, only the first bin is context coded and the remaining bins are context by-pass coded. In the JEM, the maximum number of merge candidates are 7.
In some techniques, a scheme searches the candidate motion vectors from previously coded blocks, with a step size of an 8×8 block. It defines the nearest spatial neighbors, i.e., immediate top row, left column, and top-right corner, as category 1. The outer regions (maximum three 8×8 blocks away from the current block boundary) and the collocated blocks in the previously coded frame are classified as category 2. The neighboring blocks that are predicted from different reference frames or are intra coded are pruned from the list. The remaining reference blocks are then each assigned a weight. The weight is related to the distance to the current block.
The left, above, left bottom, above right, and top left candidates that are not immediately next to the current block are checked. For example, a top left corner of the reference block has an offset of (−96, −96) to the current block. Each candidate B (i, j) or C (i, j) has an offset of 16 in the vertical direction compared to its previous B or C candidates. Each candidate A (i, j) or D (i, j) has an offset of 16 in the horizontal direction compared to its previous A or D candidates. Each E (i, j) has an offset of 16 in both horizontal direction and vertical direction compared to its previous E candidates. The candidates are checked from inside to the outside. The order of the candidates is A (i, j), B (i, j), C (i, j), D (i, j), and E (i, j).
In the above, the extended neighboring positions are determined relative to the current block or relative to the current picture. Instead of fetching those fixed locations, some techniques utilize n previously coded blocks' motion information that is stored in a designated table (buffer) to have more motion vector prediction candidates. This is referred to as history-based motion vector predictors, or HMVP. This buffer with multiple HMVP candidates is maintained during the encoding/decoding process. The buffer operates in a first-in-first-out principle. The oldest motion information in the buffer are firstly considered when this buffer is used in motion vector prediction in merge mode or AMVP mode.