Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras. The disparity model, such as an affine model, is used to indicate the displacement of an object in two view frames. Furthermore, motion vector for frames in one view can be derived from the motion vector for respective frames in another view.
For 3D video, besides the conventional texture data associated with multiple views, depth data is often captured or derived as well. The depth data may be captured for video associated with one view or multiple views. The depth information may also be derived from images of different views. The depth data is usually represented in lower spatial resolution than the texture data. The depth information is useful for view synthesis and inter-view prediction.
Some standard development activities for 3D video coding have been undertaken by Joint Collaborative Team on 3D Video Coding Extension Development within the international standardization organization—ITU-T. In the software test model version 5.0 of Advanced Video Coding (AVC)-based 3D video coding (3DV-ATM-5.0), an MVP candidate for Skip/Direct mode is derived based on disparity vectors (DVs) of neighboring blocks according to a predefined derivation order. When a block is coded in Direct mode, the motion information can be inferred from previously coded information without explicit signaling of the motion information. When a block is coded in Skip mode, neither the motion information nor the residual information is signaled. In this case, the residual signals are inferred as zero.
FIG. 1 illustrates an example of priority-based motion vector prediction (MVP) for Skip/Direct mode according to 3DV-ATM-5.0. A disparity vector (114) associated with the central point (112) of current block 110 (in a dependent view) is used to find a corresponding point (122) of a corresponding block (120) in the reference view (e.g., a base view). The MV (126) of the block (124) that covers the corresponding point (122) in the reference view is used as the inter-view MVP candidate of the current block. The disparity vector can be derived from the neighboring blocks and the depth value of the central point. The depth information associated with the current texture block (110) is shown in FIG. 1 as block 130 and the central point is shown as a shaded box. If any of the neighboring blocks has a DV, (e.g. DVA for block A in FIG. 1), the DV of the neighboring block is used as the disparity vector to locate the corresponding block in the reference picture. Otherwise, the converted disparity, namely the depth-based disparity is used, where the disparity is converted from the depth value of the central point and camera parameters. Compared to the approach that only uses the depth-based disparity, the approach that uses DVs from spatial neighboring blocks can reduce error propagation in case that the depth value of the central point is not available. The terms “disparity” and “disparity vector” are used interchangeably.
When the corresponding block pointed to by the DV of the neighboring block has no motion information available, the inter-view candidate will be regarded as not available and it continues to search for spatial candidate from the neighboring blocks. Alternatively, the inter-view candidate derivation process can be based on the disparity converted from the depth information of the current block. When a corresponding block pointed by the DV of the neighboring block or the DV converted from the depth information of current block is Intra-coded or uses an invalid reference picture for the current picture, the motion information of the corresponding block is considered as unavailable. The inter-view candidate and the median of three spatial candidates derived from the neighboring blocks A, B, and C (D is used only when C is unavailable) is shown in FIG. 1. On the decoder side, motion compensation is performed using the motion information of the derived MVP candidate. The motion information includes the prediction direction (uni-direction prediction or bi-direction prediction), the reference picture type (temporal prediction, virtual prediction, or inter-view prediction), and the reference picture index.
FIG. 2 illustrates an exemplary flowchart of inter-view MVP derivation according to 3DV-ATM-5.0. The input data to the priority based MVP candidate derivation process comprises motion data (210) associated with neighboring blocks A, B and C of the texture picture in a dependent view and depth data of the current block (250) in the dependent view. Any disparity information associated with a neighboring block is considered motion information for inter-view prediction. The availability of DVs associated with neighboring blocks is checked in step 220. If the DV for a neighboring block is not available, the DV is replaced by a derived disparity vector (DV) as shown in step 230, where the derived disparity vector is converted from the depth data associated with the current block. The disparity data for replacing an unavailable DV may correspond to the maximum disparity of the current block (step 260). The final disparity may be determined based on the median of the MVP candidates (i.e., the DVs associated with blocks A, B and C) as shown in step 240. After the disparity vector is derived for the current block, a block (124) covering the corresponding point (122) in the reference picture can be identified. The motion vector (126) associated with block 124 can be used as the inter-view MVP candidate.
In 3DV-ATM-5.0, list0 MV and list1 MV of the inter-view candidate are inferred independently when using the DVs of neighboring blocks to locate the corresponding point in the reference view. Specifically, the MVP for list0 is derived by first locating a corresponding block in the reference picture based on the list0 DV of neighboring blocks (if available) and then using the MV of the corresponding block as the MVP candidate for list0. Similarly, the MVP for list1 is derived by first locating a corresponding block in the reference picture based on the list1 DV of neighboring blocks (if available) and then using the MV of the corresponding block as the MVP candidate for list1. As shown in FIG. 3, for a current block (310) in a dependent view, list0 DV and list1 DV of neighboring blocks of the current block may be different and thus may locate different corresponding blocks (C01 and C02) in the reference view (e.g. base view). An exemplary flowchart associated with list0 and list1 interview candidate derivation is shown in FIG. 4. The flowchart is for list0 if ListX=list0 and the flowchart is for list1 if ListX=list1. The steps (410-460) in FIG. 4 are similar to those of FIG. 2 (210-260). However, the process is performed for list0 and list1 inter-view MVP derivation separately, where ListX corresponds to either list0 or list1. For example, in step 420, only the motion data (e.g., DV) associated with a neighboring block pointing to a reference picture in list0 is considered available if the target reference picture is in list0. The central position (472) of current block (470) can use the derived disparity vector to locate a corresponding position 482 in a corresponding block 480. The list0 motion vector associated with a block (484) covering the corresponding point (482) in the based view is used as the interview MVP candidate for the respective list.
When the inter-view MVP candidate is not available, the median of three spatial MVP candidates derived from the neighboring blocks A, B, and C is used as the MVP for Skip or Direct mode according to 3DV-ATM-5.0. The derivation procedures of Inter MVP candidate in Skip/Direct mode and inter-view MVP candidate are shown in FIG. 5A and FIG. 5B respectively. A target reference index is identified first. In 3DV-ATM-5.0, the target reference index is set to zero in Skip mode. For the Direct mode, the target reference index is derived as the minimum reference index of the neighboring blocks.
If the target reference index points to a temporal reference picture, the corresponding MVP derivation procedure is shown in FIG. 5A. The input data for the procedure comprises motion data (510) associated with neighboring blocks A, B and C of the texture picture in a dependent view and depth data of the current block (560) in the dependent view. For each neighboring block, the procedure first checks whether the neighboring block has any MV pointing to the target reference index (520A). If a neighboring block does not have an MV pointing to the target reference index, the MV for the neighboring block is replaced by a derived MV as shown in step 530A. The derived MV is obtained from a corresponding block located in the reference view according to the maximum disparity of the current block as shown in step 570A. The temporal MV (550A) is derived based on the median of the MVs associated with the neighboring blocks (540A). If none of the neighboring blocks has any MV pointing to the target reference picture and the corresponding block of the current block in the reference view does not have any MV pointing to the target reference index, a zero MV is used to represent the MV of the neighboring blocks.
If the target reference index points to an inter-view reference picture, the corresponding MVP derivation procedure is shown in FIG. 5B. For each neighboring block, the procedure first checks whether the neighboring block has a DV pointing to the target reference index as shown in step 520B. If the block doesn't have a DV, a DV converted from the depth values in the associated depth block (step 570B) is used to replace the unavailable DV of the neighboring block (step 530B). The disparity vector (550B) is derived from the median of the MVP candidates (540B) and is used to perform inter-view motion compensation.
The current practice of DV derivation to locate a corresponding block in the reference view includes two DVs pointing to reference pictures in list0 and list1 respectively. It is desirable to simplify the DV/MV derivation and to improve the coding efficiency of 3D video coding using improved DV/MV derivation.