In a typical video coding system utilizing motion-compensated Inter prediction, motion information is usually transmitted from an encoder sider to a decoder so that the decoder can perform the motion-compensated Inter prediction correctly. In such systems, the motion information will consume some coded bits. In order to improve coding efficiency, a decoder-side motion vector derivation method is disclosed in VCEG-AZ07 (Jianle Chen, et al., Further improvements to HMKTA-1.0, ITU-Telecommunications Standardization Sector, Study Group 16 Question 6, Video Coding Experts Group (VCEG), 52nd Meeting: 19-26 Jun. 2015, Warsaw, Poland). According to VCEG-AZ07, the decoder-side motion vector derivation method uses two Frame Rate Up-Conversion (FRUC) Modes. One of the FRUC modes is referred to as bilateral matching for B-slice and the other of the FRUC modes is referred as template matching for P-slice or B-slice.
FIG. 1 illustrates an example of FRUC bilateral matching mode, where the motion information for a current block 110 is derived based on two reference pictures. The motion information of the current block is derived by finding the best match between two blocks (120 and 130) along the motion trajectory 140 of the current block in two different reference pictures (i.e., Ref0 and Ref1). Under the assumption of continuous motion trajectory, the motion vectors MV0 associated with Ref0 and MV1 associated with Ref1 pointing to the two reference blocks shall be proportional to the temporal distances, i.e., TD0 and TD1, between the current picture (i.e., Cur pic) and the two reference pictures Ref0 and Ref1.
FIG. 2 illustrates an example of template matching FRUC mode. The neighboring areas (220a and 220b) of the current block 210 in a current picture (i.e., Cur pic) are used as a template to match with a corresponding template (230a and 230b) in a reference picture (i.e., Ref0 in FIG. 2). The best match between template 220a/220b and template 230a/230b will determine a decoder derived motion vector 240. While Ref0 is shown in FIG. 2, Ref1 can also be used as a reference picture.
According to VCEG-AZ07, a FRUC_mrg_flag is signaled when the merge_flag or skip_flag is true. If the FRUC_mrg_flag is 1, then FRUC_merge_mode is signaled to indicate whether the bilateral matching merge mode or template matching merge mode is selected. If the FRUC_mrg_flag is 0, it implies that regular merge mode is used and a merge index is signaled in this case. In video coding, in order to improve coding efficiency, the motion vector for a block may be predicted using motion vector prediction (MVP), where a candidate list is generated. A merge candidate list may be used for coding a block in a merge mode. When the merge mode is used to code a block, the motion information (e.g. motion vector) of the block can be represented by one of the candidates MV in the merge MV list. Therefore, instead of transmitting the motion information of the block directly, a merge index is transmitted to a decoder side. The decoder maintains a same merge list and uses the merge index to retrieve the merge candidate as signaled by the merge index. Typically, the merge candidate list consists of a small number of candidates and transmitting the merge index is much more efficient than transmitting the motion information. When a block is coded in a merge mode, the motion information is “merged” with that of a neighboring block by signaling a merge index instead of the motion information being explicitly transmitted. However, the prediction residuals are still transmitted. In the case that the prediction residuals are zero or very small, the prediction residuals are “skipped” (i.e., the skip mode) and the block is coded by the skip mode with a merge index to identify the merge MV in the merge list.
While the term FRUC refers to motion vector derivation for Frame Rate Up-Conversion, the underlying techniques are intended for a decoder to derive one or more merge MV candidates without the need for explicitly transmitting motion information. Accordingly, the FRUC is also called decoder derived motion information in this disclosure. Since the template matching method is a pattern-based MV derivation technique, the template matching method of the FRUC is also referred as Pattern-based MV Derivation (PMVD) in this disclosure.
In the decoder side MV derivation method, a new temporal MVP called temporal derived MVP is derived by scanning all MVs in all reference frames. To derive the LIST_0 temporal derived MVP, for each LIST_0 MV in the LIST_0 reference frames, the MV is scaled to point to the current frame. The 4×4 block that is pointed to by this scaled MV in current frame is the target current block. The MV is further scaled to point to the reference picture that refldx is equal 0 in LIST_0 for the target current block. The further scaled MV is stored in the LIST_0 MV field for the target current block. FIG. 3A and FIG. 3B illustrate examples for deriving the temporal derived MVPs for LIST_O and LIST_1 respectively. In FIG. 3A and FIG. 3B, each small square block corresponds to a 4×4 block. The temporal derived MVPs process scans all the MVs in all 4×4 blocks in all reference pictures to generate the temporal derived LIST_0 and LIST_1 MVPs of current frame. For example, in FIG. 3A, blocks 310, blocks 312 and blocks 314 correspond to 4×4 blocks of the current picture (Cir. pic), LIST_0 reference picture with index equal to 0 (i.e., refidx=0) and LIST_0 reference picture with index equal to 1 (i.e., refidx=1) respectively. Motion vectors 320 and 330 for two blocks in LIST_0 reference picture with index equal to 1 are known. Then, temporal derived MVP 322 and 332 can be derived by scaling motion vectors 320 and 330 respectively. The scaled MVP is then assigned to a corresponding block. Similarly, in FIG. 3B, blocks 340, blocks 342 and blocks 344 correspond to 4×4 blocks of the current picture (Cir. pic), LIST_1 reference picture with index equal to 0 (i.e., refidx=0) and LIST_1 reference picture with index equal to 1 (i.e., refidx=1) respectively. Motion vectors 350 and 360 for two blocks in LIST_1 reference picture with index equal to 1 are known. Then, temporal derived MVP 352 and 362 can be derived by scaling motion vectors 350 and 360 respectively.
For the bilateral matching merge mode and template matching merge mode, two-stage matching is applied. The first stage is PU-level matching, and the second stage is the sub-PU-level matching. In the PU-level matching, multiple initial MVs in LIST_0 and LIST_1 are selected respectively. These MVs includes the MVs from merge candidates (i.e., the conventional merge candidates such as these specified in the HEVC standard) and MVs from temporal derived MVPs. Two different staring starting MV sets are generated for two lists. For each MV in one list, a MV pair is generated by composing of this MV and the mirrored MV that is derived by scaling the MV to the other list. For each MV pair, two reference blocks are compensated by using this MV pair. The sum of absolutely differences (SAD) of these two blocks is calculated. The MV pair with the smallest SAD is selected as the best MV pair.
After a best MV is derived for a PU, the diamond search is performed to refine the MV pair. The refinement precision is ⅛-pel. The refinement search range is restricted within ±1 pixel. The final MV pair is the PU-level derived MV pair. The diamond search is a fast block matching motion estimation algorithm that is well known in the field of video coding. Therefore, the details of diamond search algorithm are not repeated here.
For the second-stage sub-PU-level searching, the current PU is divided into sub-PUs. The depth (e.g. 3) of sub-PU is signaled in sequence parameter set (SPS). Minimum sub-PU size is 4×4 block. For each sub-PU, multiple starting MVs in LIST_0 and LIST_1 are selected, which include the MV of PU-level derived MV, zero MV, HEVC collocated TMVP of current sub-PU and bottom-right block, temporal derived MVP of current sub-PU, and MVs of left and above PU/sub-PU. By using the similar mechanism as the PU-level searching, the best MV pair for the sub-PU is determined. The diamond search is performed to refine the MV pair. The motion compensation for this sub-PU is performed to generate the predictor for this sub-PU.
For the template matching merge mode, the reconstructed pixels of above 4 rows and left 4 columns are used to form a template. The template matching is performed to find the best matched template with its corresponding MV. Two-stage matching is also applied for template matching. In the PU-level matching, multiple starting MVs in LIST_0 and LIST_1 are selected respectively. These MVs include the MVs from merge candidates (i.e., the conventional merge candidates such as these specified in the HEVC standard) and MVs from temporal derived MVPs. Two different starting MV sets are generated for two lists. For each MV in one list, the SAD cost of the template with the MV is calculated. The MV with the smallest cost is the best MV. The diamond search is then performed to refine the MV. The refinement precision is ⅛-pel. The refinement search range is restricted within ±1 pixel. The final MV is the PU-level derived MV. The MVs in LIST_0 and LIST_1 are generated independently.
For the second-stage sub-PU-level searching, the current PU is divided into sub-PUs. The depth (e.g. 3) of sub-PU is signaled in SPS. Minimum sub-PU size is 4×4 block. For each sub-PU at left or top PU boundaries, multiple starting MVs in LIST_0 and LIST_1 are selected, which includes MV of PU-level derived MV, zero MV, HEVC collocated TMVP of current sub-PU and bottom-right block, temporal derived MVP of current sub-PU, and MVs of left and above PU/sub-PU. By using the similar mechanism as the PU-level searching, the best MV pair for the sub-PU is determined. The diamond search is performed to refine the MV pair. The motion compensation for this sub-PU is performed to generate the predictor for this sub-PU. For these PUs that are not at left or top PU boundaries, the second-stage sub-PU-level searching is not applied, and the corresponding MVs are set equal to the MVs in the first stage.
In this decoder MV derivation method, the template matching is also used to generate a MVP for Inter mode coding. When a reference picture is selected, the template matching is performed to find a best template on the selected reference picture. Its corresponding MV is the derived MVP. This MVP is inserted into the first position in AMVP. AMVP represents advanced MV prediction and AMVP is a coding tool for coding the motion vector(s) of the current block in Inter coding mode. According to AMVP, a current MV is coded predictively using a motion vector predictor selected from a candidate list. The MV difference between the current MV and a selected MV candidate in the candidate list is coded.
In JVET-D0029 (Xu Chen, et al., “Decoder-Side Motion Vector Refinement Based on Bilateral Template Matching”, Joint Video Exploration Team (WET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 4th Meeting: Chengdu, CN, 15-21 Oct. 2016, Document: JVET-D0029), Decoder-Side Motion Vector Refinement (DMVR) based on bilateral template matching is disclosed. A template is generated by using the bi-prediction from the reference blocks (410 and 420) of MV0 and MV1, as shown in FIG. 4. In other words, the template for DMVR is formed by combining two reference blocks (i.e., 410 and 420), which is different from the template generated for template matching of PMVP in FIG. 2. Using the template 500 as a new current block and perform the motion estimation to find a better matching block (510 and 520 respectively) in Ref. Picture 0 and Ref. Picture 1, respectively, as shown in FIG. 5. The refined MVs are referred as MV0′ and MV1′. The refined MVs (MV0′ and MV1′) are then used to generate a final bi-predicted prediction block for the current block.
In JVET-00047 (Chun-Chi Chen, et al., “Generalized bi-prediction for inter coding”, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 3rd Meeting: Geneva, CH, 26 May-1 Jun. 2016, Document: JVET-00047), generalized bi-prediction (GBi) for Inter coding is disclosed. Traditionally, the weighting factors for two predictors in the bi-prediction are equal. However, GBi supports different weighting factors for these two predictors. For example, a larger weighting factor can be used for L0 predictor and a smaller weighting factor can be used for L1 predictor. Alternatively, a larger weighting factor can be used for L1 predictor and a smaller weighting factor can be used for L0 predictor. The set of supported weighting factors can be pre-defined and the selected weighting factor is signaled for bi-prediction Inter CUs or derived based on Merge candidate selection. In JVET-00047, the set can be {3, 4, 5}, {2, 3, 4, 5, 6} or {−2, 2, 3, 4, 5, 6, 10}.
In JVET-00047, GBi doesn't support the decoder-side motion derivation Inter coding tools, such as DMVR or PMVD. These decoder-side motion derivation Inter coding tools have been shown to be useful for improving coding performance. Therefore, it is desirable to develop techniques to expand the GBi methods by supporting the decoder-side motion derivation Inter coding tools to further improve GBi coding performance.