In a typical video coding system utilizing motion-compensated Inter prediction, motion information is usually transmitted from an encoder sider to a decoder so that the decoder can perform the motion-compensated Inter prediction correctly. In such systems, the motion information will consume some coded bits. In order to improve coding efficiency, a decoder-side motion vector derivation method is disclosed in VCEG-AZ07 (Jianle Chen, et al., Further improvements to HMKTA-1.0, ITU—Telecommunications Standardization Sector, Study Group 16 Question 6, Video Coding Experts Group (VCEG), 52nd Meeting: 19-26 Jun. 2015, Warsaw, Poland). According to VCEG-AZ07, the decoder-side motion vector derivation method uses two Frame Rate Up-Conversion (FRUC) Modes. One of the FRUC modes is referred as bilateral matching for B-slice and the other of the FRUC modes is referred as template matching for P-slice or B-slice.
FIG. 1 illustrates an example of FRUC bilateral matching mode, where the motion information for a current block 110 is derived based on two reference pictures. The motion information of the current block is derived by finding the best match between two blocks (120 and 130) along the motion trajectory 140 of the current block in two different reference pictures (i.e., Ref0 and ref1). Under the assumption of continuous motion trajectory, the motion vectors MV0 associated with ref0 and MV1 associated with Ref1 pointing to the two reference blocks shall be proportional to the temporal distances, i.e., TD0 and TD1, between the current picture (i.e., Cur pic) and the two reference pictures.
FIG. 2 illustrates an example of template matching FRUC mode. The neighbouring areas (220a and 220b) of the current block 210 in a current picture (i.e., Cur pic) are used as a template to match with a corresponding template (230a and 230b) in a reference picture (i.e., Ref0). The best match between template 220a/220b and template 230a/230b will determine a decoder derived motion vector 240. While Ref0 is shown in FIG. 2, Ref1 can also be used as a reference picture.
According to VCEG-AZ07, a FRUC_mrg_flag is signalled when the merge_flag or skip_flag is true. If the FRUC_mrg_flag is 1, then FRUC_merge_mode is signalled to indicate whether the bilateral matching merge mode or template matching merge mode is selected. If the FRUC_mrg_flag is 0, it implies that regular merge mode is used and a merge index is signalled in this case. In video coding, in order to improve coding efficiency, the motion vector for a block may be predicted using motion vector prediction (MVP), where a candidate list is generated. A merge candidate list may be used for coding a block in a merge mode. When the merge mode is used to code a block, the motion information (e.g. motion vector) of the block can be represented by one of the candidates MV in the merge MV list. Therefore, instead of transmitting the motion information of the block directly, a merge index is transmitted to a decoder side. The decoder maintains a same merge list and uses the merge index to retrieve the merge candidate as signalled by the merge index. Typically, the merge candidate list consists of a small number of candidates and transmitting the merge index is much more efficient than transmitting the motion information. When a block is coded in a merge mode, the motion information is “merged” with that of a neighbouring block by signalling a merge index instead of explicitly transmitted. However, the prediction residuals are still transmitted. In the case that the prediction residuals are zero or very small, the prediction residuals are “skipped” (i.e., the skip mode) and the block is coded by the skip mode with a merge index to identify the merge MV in the merge list.
While the term FRUC refers to motion vector derivation for Frame Rate Up-Conversion, the underlying techniques are intended for a decoder to derive one or more merge MV candidates without the need for explicitly transmitting motion information. Accordingly, the FRUC is also called decoder derived motion information in this disclosure. Since the template matching method is a pattern-based MV derivation technique, the FRUC technique is also referred as Pattern-based MV Derivation (PMVD) in this disclosure.
In the decoder side MV derivation method, a new temporal MVP called temporal derived MVP is derived by scanning all MVs in all reference frames. To derive the LIST_0 temporal derived MVP, for each LIST_0 MV in the LIST_0 reference frames, the MV is scaled to point to the current frame. The 4×4 block that pointed by this scaled MV in current frame is the target current block. The MV is further scaled to point to the reference picture that refIdx is equal 0 in LIST_0 for the target current block. The further scaled MV is stored in the LIST_0 MV field for the target current block. FIG. 3A and FIG. 3B illustrate examples for deriving the temporal derived MVPs for List_0 and List_1 respectively. In FIG. 3A and FIG. 3B, each small square block corresponds to a 4×4 block. The temporal derived MVPs process scans all the MVs in all 4×4 blocks in all reference pictures to generate the temporal derived LIST_0 and LIST_1 MVPs of current frame. For example, in FIG. 3A, blocks 310, blocks 312 and blocks 314 correspond to 4×4 blocks of the current picture, List_0 reference picture with index equal to 0 (i.e., refidx=0) and List_0 reference picture with index equal to 1 (i.e., refidx=1) respectively. Motion vectors 320 and 330 for two blocks in List_0 reference picture with index equal to 1 are known. Then, temporal derived MVP 322 and 332 can be derived by scaling motion vectors 320 and 330 respectively. The scaled MVP is then assigned it to a corresponding block. Similarly, in FIG. 3B, blocks 340, blocks 342 and blocks 344 correspond to 4×4 blocks of the current picture, List_1 reference picture with index equal to 0 (i.e., refidx=0) and List_1 reference picture with index equal to 1 (i.e., refidx=1) respectively. Motion vectors 350 and 360 for two blocks in List_1 reference picture with index equal to 1 are known. Then, temporal derived MVP 352 and 362 can be derived by scaling motion vectors 350 and 360 respectively.
For the bilateral matching merge mode and template matching merge mode, two-stage matching is applied. The first stage is PU-level matching, and the second stage is the sub-PU-level matching. In the PU-level matching, multiple initial MVs in LIST_0 and LIST_1 are selected respectively. These MVs includes the MVs from merge candidates (i.e., the conventional merge candidates such as these specified in the HEVC standard) and MVs from temporal derived MVPs. Two different staring MV sets are generated for two lists. For each MV in one list, a MV pair is generated by composing this MV and the mirrored MV that is derived by scaling the MV to the other list. For each MV pair, two reference blocks are compensated by using this MV pair. The sum of absolutely differences (SAD) of these two blocks is calculated. The MV pair with the smallest SAD is selected as the best MV pair.
After a best MV is derived for a PU, the diamond search is performed to refine the MV pair. The refinement precision is ⅛-pel. The refinement search range is restricted within ±1 pixel. The final MV pair is the PU-level derived MV pair. The diamond search is a fast block matching motion estimation algorithm that is well known in the field of video coding. Therefore, the details of diamond search algorithm are not repeated here.
For the second-stage sub-PU-level searching, the current PU is divided into sub-PUs. The depth (e.g. 3) of sub-PU is signalled in sequence parameter set (SPS). Minimum sub-PU size is 4×4 block. For each sub-PU, multiple starting MVs in LIST_0 and LIST_1 are selected, which include the MV of PU-level derived MV, zero MV, HEVC collocated TMVP of current sub-PU and bottom-right block, temporal derived MVP of current sub-PU, and MVs of left and above PU/sub-PU. By using the similar mechanism as the PU-level searching, the best MV pair for the sub-PU is determined. The diamond search is performed to refine the MV pair. The motion compensation for this sub-PU is performed to generate the predictor for this sub-PU.
For the template matching merge mode, the reconstructed pixels of above 4 rows and left 4 columns are used to form a template. The template matching is performed to find the best matched template with its corresponding MV. Two-stage matching is also applied for template matching. In the PU-level matching, multiple starting MVs in LIST_0 and LIST_1 are selected respectively. These MVs include the MVs from merge candidates (i.e., the conventional merge candidates such as these specified in the HEVC standard) and MVs from temporal derived MVPs. Two different staring MV sets are generated for two lists. For each MV in one list, the SAD cost of the template with the MV is calculated. The MV with the smallest cost is the best MV. The diamond search is then performed to refine the MV. The refinement precision is ⅛-pel. The refinement search range is restricted within ±1 pixel. The final MV is the PU-level derived MV. The MVs in LIST_0 and LIST_1 are generated independently.
For the second-stage sub-PU-level searching, the current PU is divided into sub-PUs. The depth (e.g. 3) of sub-PU is signalled in SPS. Minimum sub-PU size is 4×4 block. For each sub-PU at left or top PU boundaries, multiple starting MVs in LIST_0 and LIST_1 are selected, which include MV of PU-level derived MV, zero MV, HEVC collocated TMVP of current sub-PU and bottom-right block, temporal derived MVP of current sub-PU, and MVs of left and above PU/sub-PU. By using the similar mechanism as the PU-level searching, the best MV pair for the sub-PU is determined. The diamond search is performed to refine the MV pair. The motion compensation for this sub-PU is performed to generate the predictor for this sub-PU. For these PUs that are not at left or top PU boundaries, the second-stage sub-PU-level searching is not applied, and the corresponding MVs are set equal to the MVs in the first stage.
In this decoder MV derivation method, the template matching is also used to generate a MVP for inter mode coding. When a reference picture is selected, the template matching is performed to find a best template on the selected reference picture. Its corresponding MV is the derived MVP. This MVP is inserted into the first position in AMVP. AMVP represents advanced MV prediction, where a current MV is coded predictively using a candidate list. The MV difference between the current MV and a selected MV candidate in the candidate list is coded.
While the decoder-derived motion information method can reduce bitrate associated with signalling the motion information, the method tries out various motion vector candidates for various modes (e.g. FRUC modes, TMVP, AMVP, etc.) Such process not only causes high computational load, but also causes high system memory bandwidth due to the need of accessing reference data for various motion vector candidates and for various modes. Therefore, it is desirable to develop technique to reduce the memory bandwidth and/or computational loads.