Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance. In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras. The disparity model, such as an affine model, is used to indicate the displacement of an object in two view frames. Furthermore, motion vector for frames in one view can be derived from the motion vector for respective frames in another view.
For 3D video, besides the conventional texture data associated with multiple views, depth data is often captured or derived as well. The depth data may be captured for video associated with one view or multiple views. The depth information may also be derived from images of different views. The depth data may be represented in lower spatial resolution than the texture data. The depth information is useful for view synthesis and inter-view prediction.
Some standard development activities for 3D video coding have been undertaken by Joint Collaborative Team on 3D Video Coding Extension Development within the international standardization organization—ITU-T. In the software test model version 5.0 of Advanced Video Coding (AVC)-based 3D video coding (3DV-ATM-6.0), a motion vector prediction (MVP) candidate for Skip/Direct mode is derived based on disparity vectors (DVs) of neighboring blocks according to a predefined derivation order. When a block is coded in Direct mode, the motion information can be inferred from previously coded information without explicit signaling of the motion information. When a block is coded in Skip mode, neither the motion information nor the residual information is signaled. In this case, the residual signals are inferred as zero.
FIG. 1 illustrates an example of priority-based MVP for Skip/Direct mode according to 3DV-ATM-6.0. A disparity vector (114) associated with the central point (112) of current block 110 (in a dependent view) is used to find a corresponding point (122) of a corresponding block (120) in the reference view (a base view). The MV (126) of the block (124) that covers the corresponding point (122) in the reference view is used as the inter-view MVP candidate of the current block. Disparity vector 114 can be derived from both the neighboring blocks and the depth value of central point 112. The depth information associated with the current texture block (110) is shown in FIG. 1 as block 130 and the central point is shown as a shaded box. If any of the neighboring blocks has a DV, (e.g. DVA for block A in FIG. 1), the DV of the neighboring block is used as the disparity vector to locate the corresponding block in the reference picture. Otherwise, the converted disparity, namely the depth-based disparity is used, where the disparity is converted from the depth value of the central point and camera parameters. Compared to the approach that only uses the depth-based disparity, the approach that uses DVs from spatial neighboring blocks can reduce error propagation in case that the depth value of the central point is not available. The terms “disparity” and “disparity vector” are used interchangeably.
When the corresponding block pointed to by the DV of the neighboring block has no motion information available, the inter-view candidate will be regarded as not available and it continues to search spatial candidate from the neighboring block. Alternatively, the inter-view candidate derivation process can be based on the disparity converted from the depth of the current block. When a corresponding block pointed by the DV of the neighboring block or the DV converted from the depth of current block is Intra-coded or uses an invalid reference picture for the current picture, the motion information of the corresponding block is considered as unavailable. The exemplary flowchart of inter-view candidate derivation based on the median of three spatial candidates derived from the neighboring blocks A, B, and C (D is used only when C is unavailable) is shown in FIG. 2. On the decoder side, motion compensation is performed using the motion information of the derived MVP candidate. The motion information includes the prediction direction (uni-direction prediction or bi-direction prediction), the reference picture type (temporal prediction, virtual prediction, or inter-view prediction), and the reference picture index.
FIG. 2 illustrates an exemplary flowchart of inter-view MVP derivation according to 3DV-ATM-6.0. The input data to the priority based MVP candidate derivation process comprises motion data (210) associated with neighboring blocks A, B and C of the texture picture in a dependent view and depth data of the current block (250) in the dependent view. Any disparity information associated with a neighboring block is considered motion information for inter-view prediction. The availability of DVs associated with neighboring blocks is checked in step 220. If the MV for a neighboring block is not available, the MV is replaced by a derive disparity vector (DV) as shown in step 230, where the derive disparity vector is converted from the depth data associated with the current block. The disparity data for replacing an unavailable MV may correspond to the maximum disparity of the current block (step 260). The final disparity may be determined based on the median of the MVP candidates (i.e., the DVs associated with blocks A, B and C) as shown in step 240. After the disparity vector is derived for the current block, a block (124) covering the corresponding point (122) in the reference picture can be identified. The motion vector (126) associated with block 124 can be used as the inter-view MVP candidate.
In 3DV-ATM-6.0, list 0 MV and list 1 MV of the inter-view candidate are inferred independently when using the DVs of neighboring blocks to locate the corresponding point in the reference view. Specifically, the MVP for list 0 is derived by first locating a corresponding block in the reference picture based on the list 0 DV of neighboring blocks (if available) and then using the MV of the corresponding block as the MVP candidate for list 0. Similarly, the MVP for list 1 is derived by first locating a corresponding block in the reference picture based on the list 0 DV of neighboring blocks (if available) and then using the MV of the corresponding block as the MVP candidate for list 1. As shown in FIG. 3, for a current block (310) in a dependent view, list 0 DV and list 1 DV of neighboring blocks of the current block may be different and thus may locate different corresponding blocks (C01 and C02) in the reference view. An exemplary flowchart associated with list 0 and list 1 interview candidate derivation is shown in FIG. 4. The flowchart is for list 0 if ListX corresponds to list 0 and the flowchart is for list 1 if ListX corresponds to list 1. The steps (410-460) in FIG. 4 are similar to those of FIG. 2 (210-260). However, the process is performed for list 0 and list 1 inter-view MVP derivation separately, where ListX corresponds to either list 0 or list 1. For example, in step 420, only the motion data (e.g., DV) associated with a neighboring block pointing to a reference picture in list 0 is considered available if the target reference picture is in list 0. The central position (472) of current block (470) can use the derived disparity vector to locate a corresponding position 482 in a corresponding block 480. The motion vector associated with a block (484) covering the corresponding point (482) in the based view is used as the interview MVP candidate for the respective list.
When the inter-view MVP candidate is not available, the median of three spatial MVP candidates derived from the neighboring blocks A, B, and C is used as the MVP for Skip or Direct mode according to 3DV-ATM-6.0. The derivation procedures of Inter MVP candidate in Skip/Direct mode and inter-view MVP candidate are shown in FIG. 5A and FIG. 5B respectively.
If the target reference picture index points to a temporal reference picture, the corresponding MVP derivation procedure is shown in FIG. 5A. The reference picture index may be abbreviated as the reference index in this disclosure. The input data for the procedure comprises motion data (510) associated with neighboring blocks A, B and C of the current block of the texture picture in a dependent view and depth data of the current block (560) in the dependent view. For each neighboring block, the procedure first checks whether the neighboring block has any MV pointing to the target reference index (520A). If a neighboring block does not have an MV pointing to the target reference index, the MV for the neighboring block is replaced by a derived MV as shown in step 530A. The derived MV is obtained from a corresponding block located in the reference view according to the maximum disparity of the current block as shown in step 570A. The temporal MV (550A) is derived based on the median of the MVs associated with the neighboring blocks (540A). If none of the neighboring blocks has any MV pointing to the target reference picture and the corresponding block of the current block in the reference view does not have any MV pointing to the target reference index, a zero MV is used to represent the MV of the neighboring blocks.
If the target reference index points to an inter-view reference picture, the corresponding MVP derivation procedure is shown in FIG. 5B. For each neighboring block, the procedure first checks whether the neighboring block has a DV pointing to the target reference index as shown in step 520B. If the neighboring block doesn't have a DV pointing to the target reference index, a DV converted from the depth values in the associated depth block (step 570B) is used to replace the unavailable DV of the neighboring block (step 530B). The disparity vector (550B) is derived from the median of the MVP candidates (540B).
The spatial MVP derivations for Direct mode and Skip mode according to 3DV-ATM-6.0 are shown in FIG. 6. The target reference index is selected to be zero for Skip mode. For Direct mode, the target reference index is selected according to the minimum reference index of the neighboring blocks as shown in step 610. After the target reference index is identified, the availability of MV for blocks A, B and C pointing to the selected target reference index are performed in steps 620A and 620B respectively for Direct mode and Skip mode. The spatial MVP is then determined based on the median of the MVs of the neighboring blocks as shown in step 630.
The reference picture for Direct Mode may change from block to block (e.g., a coding unit or a macroblock). For video encoding, when a block is coded in Direct mode, the motion estimation process may have to access the reference pictures repeatedly during rate-distortion optimization associated with an MVP. The switch between different reference pictures from block to block will cause high latency to derive spatial MVP, reduce cache efficiency, and increase memory bandwidth requirement. Furthermore, it may have to load reference blocks from different reference pictures on a block to block basis. FIG. 7 illustrates an example where the MVP derivation for blocks 710 has to switch between two reference frames (idx 0 and idx 1) corresponding to reference picture index 0 and reference picture index 1. It is desirable to simplify the spatial MVP derivation process in order to reduce cache efficiency and reduce memory bandwidth requirement.