Motion estimation/compensation is a powerful coding tool that has been used in various coding standards such as MPEG-2, H.264 and the emerging HEVC (High Efficiency Video Coding) standard. The motion information derived at the encoder side has to be transmitted to the decoder side, which may consume sizeable bandwidth. In order to improve the coding efficiency for motion information, motion vector prediction (MVP) to coding a current motion vector (MV) predictively has been developed.
Merge Mode and AMVP Mode
For each Inter PU, one or two motion vectors (MVs) are determined using motion estimation. In order to increase the coding efficiency of motion vector (MV) coding in HEVC, HEVC motion vector prediction (MVP) to encode MV predictively. In particular, HEVC supports the Skip and Merge modes for MVP coding. For Skip and Merge modes, a set of candidates are derived based on the motion information of spatially neighbouring blocks (spatial candidates) or a temporal co-located block (temporal candidate). When a PU is coded using the Skip or Merge mode, no motion information is signalled. Instead, only the index of the selected candidate is coded. For the Skip mode, the residual signal is forced to be zero and not coded. In other words, no information is signalled for the residuals. Each merged PU reuses the MV, prediction direction, and reference picture index of the selected candidate.
For Merge mode in HEVC, up to four spatial MV candidates are derived from neighbouring blocks A0, A1, B0 and B1, and one temporal MV candidate is derived from bottom-right block, TBR or centre-block TCT as shown in FIG. 1. For the temporal candidate, TBR is used first. If TBR is not available, TCT is used instead. Note that if any of the four spatial MV candidates is not available, the block B2 is then used to derive MV candidate as a replacement. After the derivation process of the four spatial MV candidates and one temporal MV candidate, removing redundancy (pruning) is applied to remove any redundant MV candidate. If after removing redundancy (pruning), the number of available MV candidates is smaller than five, three types of additional candidates are derived and are added to the candidate set (candidate list). The encoder selects one final candidate within the candidate set for Skip or Merge mode based on the rate-distortion optimization (RDO) decision, and transmits the index to the decoder.
Since the derivations of Skep and Merge candidates are similar, the “Merge” mode referred hereafter may correspond to “Merge” mode as well as “Skip” mode for convenience.
The MVP technique is also applied to code a motion vector predictively, which is referred as AMVP (Advanced Motion Vector Prediction). When a PU is coded in Inter AMVP mode, motion-compensated prediction is performed with transmitted motion vector differences (MVDs) that can be used together with Motion Vector Predictors (MVPs) for deriving motion vectors (MVs). To decide MVP in Inter AMVP mode, the AMVP scheme is used to select a motion vector predictor among an AMVP candidate set including two spatial MVPs and one temporal MVP. Therefore, an AMVP index for MVP and the corresponding MVDs need to be encoded and transmitted for an AMVP-coded block. In addition, the Inter prediction direction to specify the prediction directions among bi-prediction and uni-prediction (i.e., list 0 (L0) and/or list 1 (L1)) associated with the reference frame index for each list should also be encoded and transmitted.
When a PU is coded in either Skip or Merge mode, no motion information is transmitted except the Merge index of the selected candidate since the Skip and Merge modes utilize motion inference methods (i.e., MV=MVP+MVD where MVD being zero) to obtain the motion information from the selected Merge/Skip candidate.
In AMVP, the left MVP is selected based on the first available one from A0, A1, the top MVP is the first available one from B0, B1, B2, and the temporal MVP is the first available one from TBR or TCT (TBR is used first, if TBR is not available, TCT is used instead). If the left MVP is not available and the top MVP is not scaled MVP, the second top MVP can be derived if there is a scaled MVP among B0, B1, and B2. The list size of MVPs of AMVP is 2 in HEVC. Therefore, after the derivation process of the two spatial MVPs and one temporal MVP, only the first two MVPs can be included in the MVP list. If after removing redundancy, the number of available MVPs is less than two, zero vector candidates are added to the candidates list.
Bilateral Template MV Refinement
Bilateral Template MV Refinement (BTMVR) is also referred as Decoder-side MV refinement (DMVR) in some literature. For example, in JVET-D0029 (Xu Chen, et al., “Decoder-Side Motion Vector Refinement Based on Bilateral Template Matching”, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 4th Meeting: Chengdu, CN, 15-21 Oct. 2016, Document: JVET-D0029), Decoder-Side Motion Vector Refinement (DMVR) based on bilateral template matching is disclosed. The process of BTMVR is shown in FIG. 2, where block 210 is a current block. Initial motion vectors MV0 220a and MV1 220b for current block 210 are determined.
In BTMVR (i.e., DMVR), it uses two-stage search to refine the MVs of the current block. For a current block, the cost of current MV candidate is first evaluated. In the first stage search, the integer-pixel search is performed around the current pixel location. Eight candidates are evaluated. The horizontal distance, vertical distance or both between two adjacent circles or between the square symbol and the adjacent circle is one pixel. The best candidate with the lowest cost is selected as the best MV candidate in the first stage. In the second stage, a half-pixel square search is performed around the best MV candidate in the first stage. The best MV candidate with the lowest cost is selected the final MV for the final motion compensation.
When the underlying video data correspond to colour video, the Bilateral Template MV Refinement process may also be applied to the colour components. As is known in the field, colour video may be in various colour format, such as YUV or YCrCb colour components with different colour sampling formats such as 4:4:4, 4:2:2 and 4:2:0. For example, when the colour video is in the YCrCb420 format, the Cr and Cb chrominance components have half vertical resolution and half horizontal resolution of the Y luminance component. A motion vector derived based on the Y component needs to be scaled before the motion vector is used for the chrominance components.
Pattern-Based MV Derivation (PMVD)
FIG. 3 illustrates an example of FRUC (Frame Rate Up Conversion) bilateral matching mode, where the motion information for a current block 310 is derived based on two reference pictures. The motion information of the current block is derived by finding the best match between two reference blocks (320 and 330) along the motion trajectory 340 of the current block in two different reference pictures (i.e., Ref0 and Ref1). Under the assumption of continuous motion trajectory, the motion vectors MV0 associated with Ref0 and MV1 associated with Ref1 pointing to the two reference blocks shall be proportional to the temporal distances, i.e., TD0 and TD1, between the current picture (i.e., Cur pic) and the two reference pictures Ref0 and Ref1.
FIG. 4 illustrates an example of template matching FRUC (Frame Rate Up Conversion) mode. The neighbouring areas (420a and 420b) of the current block 410 in a current picture (i.e., Cur pic) are used as a template to match with a corresponding template (430a and 430b) in a reference picture (i.e., Ref0 in FIG. 4). The best match between template 420a/420b and template 430a/430b will determine a decoder derived motion vector 440. While Ref0 is shown in FIG. 4, Ref1 can also be used as a reference picture.
According to VCEG-AZ07, a FRUC_mrg_flag is signalled when the merge_flag or skip_flag is true. If the FRUC_mrg_flag is 1, then FRUC_merge_mode is signalled to indicate whether the bilateral matching merge mode or template matching merge mode is selected. If the FRUC_mrg_flag is 0, it implies that regular merge mode is used and a merge index is signalled in this case. In video coding, in order to improve coding efficiency, the motion vector for a block may be predicted using motion vector prediction (MVP), where a candidate list is generated. A merge candidate list may be used for coding a block in a merge mode. When the merge mode is used to code a block, the motion information (e.g. motion vector) of the block can be represented by one of the candidates MV in the merge MV list. Therefore, instead of transmitting the motion information of the block directly, a merge index is transmitted to a decoder side. The decoder maintains a same merge list and uses the merge index to retrieve the merge candidate as signalled by the merge index. Typically, the merge candidate list consists of a small number of candidates and transmitting the merge index is much more efficient than transmitting the motion information. When a block is coded in a merge mode, the motion information is “merged” with that of a neighbouring block by signalling a merge index instead of explicitly transmitted. However, the prediction residuals are still transmitted. In the case that the prediction residuals are zero or very small, the prediction residuals are “skipped” (i.e., the skip mode) and the block is coded by the skip mode with a merge index to identify the merge MV in the merge list.
While the term FRUC refers to motion vector derivation for Frame Rate Up-Conversion, the underlying techniques are intended for a decoder to derive one or more merge MV candidates without the need for explicitly transmitting motion information. Accordingly, the FRUC is also called decoder derived motion information in this disclosure. Since the template matching method is a pattern-based MV derivation technique, the template matching method of the FRUC is also referred as Pattern-based MV Derivation (PMVD) in this disclosure.
In the decoder side MV derivation method, a new temporal MVP called temporal derived MVP is derived by scanning all MVs in all reference pictures. To derive the LIST_0 temporal derived MVP, for each LIST_0 MV in the LIST_0 reference pictures, the MV is scaled to point to the current frame. The 4×4 block that pointed by this scaled MV in current frame is the target current block. The MV is further scaled to point to the reference picture that refIdx is equal 0 in LIST_0 for the target current block. The further scaled MV is stored in the LIST_0 MV field for the target current block. FIG. 5A and FIG. 5B illustrate examples for deriving the temporal derived MVPs for LIST_0 and LIST_1 respectively. In FIG. 5A and FIG. 5B, each small square block corresponds to a 4×4 block. The temporal derived MVPs process scans all the MVs in all 4×4 blocks in all reference pictures to generate the temporal derived LIST_0 and LIST_1 MVPs of current frame. For example, in FIG. 5A, blocks 510, blocks 512 and blocks 514 correspond to 4×4 blocks of the current picture (Cur. pic), LIST_0 reference picture with index equal to 0 (i.e., refidx=0) and LIST_0 reference picture with index equal to 1 (i.e., refidx=1) respectively. Motion vectors 520 and 530 for two blocks in LIST_0 reference picture with index equal to 1 are known. Then, temporal derived MVP 522 and 532 can be derived by scaling motion vectors 520 and 530 respectively. The scaled MVP is then assigned it to a corresponding block. Similarly, in FIG. 5B, blocks 540, blocks 542 and blocks 544 correspond to 4×4 blocks of the current picture (Cur. pic), LIST_1 reference picture with index equal to 0 (i.e., refidx=0) and LIST_1 reference picture with index equal to 1 (i.e., refidx=1) respectively. Motion vectors 550 and 560 for two blocks in LIST_1 reference picture with index equal to 1 are known. Then, temporal derived MVP 552 and 562 can be derived by scaling motion vectors 550 and 560 respectively.
For the bilateral matching merge mode and template matching merge mode, two-stage matching is applied. The first stage is PU-level matching, and the second stage is the sub-PU-level matching. In the PU-level matching, multiple initial MVs in LIST_0 and LIST_1 are selected respectively. These MVs includes the MVs from merge candidates (i.e., the conventional merge candidates such as these specified in the HEVC standard) and MVs from temporal derived MVPs. Two different staring MV sets are generated for two lists. For each MV in one list, a MV pair is generated by composing of this MV and the mirrored MV that is derived by scaling the MV to the other list. For each MV pair, two reference blocks are compensated by using this MV pair. The sum of absolutely differences (SAD) of these two blocks is calculated. The MV pair with the smallest SAD is selected as the best MV pair.
Local Illumination Compensation (LIC)
Local Illumination Compensation (LIC) is a method to perform Inter prediction using neighbour samples of the current block and a reference block. It is based on a linear model using a scaling factor a and an offset b. The method derives the scaling factor a and the offset b by referring to the neighbour samples of the current block and the reference block. Moreover, the LIC process can be enabled or disabled adaptively for each CU.
More details regarding LIC can be found in JVET-C1001 ((Xu Chen, et al., “Algorithm Description of Joint Exploration Test Model 3”, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 3rd Meeting: Geneva, CH, 26 May-1 Jun. 2016, Document: JVET-C1001).
Advanced Motion Vector Resolution (AMVR)
To improve the coding gain, the Advanced Motion Vector Resolution (AMVR) has also been introduced recently. The AMVR can adaptively switch the resolution of Motion Vector Difference (MVD). The Motion Vector Difference (MVD) between a current MV and the MV predictor of a PU) can be coded with either quarter-pel resolution or integer-pel resolution. The switching is controlled at coding unit (CU) level and an integer MVD resolution flag is (conditionally) signalled.
More details regarding AMVR can be found in JVET-C1001 ((Xu Chen, et al., “Algorithm Description of Joint Exploration Test Model 3”, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 3rd Meeting: Geneva, CH, 26 May-1 Jun. 2016, Document: JVET-C1001).
In JVET-E0076 (Chen et al., “EE5EE4: Enhanced Motion Vector Difference Coding”, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, the 5th JVET meeting, January 2017, Geneva, Document: JVET-E0076), a modified MVD coding method has been adopted. The modified MVD coding method includes two elements: a) 4-pel accuracy for MVD signalling (in addition to ¼-pel and integer-pel MVD accuracy, and b) switchable binarization and context model selection. According to JVET-E0076, a first flag is signalled to indicate whether ¼-pel MV precision for the luma signal is used in a CU. When the first flag indicates that ¼-pel MV precision for the luma signal is not used, another flag is signalled to indicate whether integer luma sample or four luma samples MV precision is used.
In the following disclosure, various techniques to improve the coding efficiency by utilizing the template of a current block and the template of one or more reference blocks, previous coded data or information are described.