Three-dimensional (3D) television has been a technology trend in recent years that is targeted to bring viewers sensational viewing experience. Multi-view video is a technique to capture and render 3D video. The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The multi-view video with a large number of video sequences associated with the views represents a massive amount data. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space and the transmission bandwidth. A straightforward approach may simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such straightforward techniques would result in poor coding performance.
In order to improve multi-view video coding efficiency, multi-view video coding always exploits inter-view redundancy. The disparity between two views is caused by the locations and angles of the two respective cameras. Since all cameras capture the same scene from different viewpoints, multi-view video data contains a large amount of inter-view redundancy. To exploit the inter-view redundancy, coding tools utilizing disparity vector (DV) have been developed for 3D-HEVC (High Efficiency Video Coding) and 3D-AVC (Advanced Video Coding). For example, DV is used as a temporal inter-view motion vector candidate (TIVC) in advanced motion vector prediction (AMVP) and Merge modes. DV is also used as a disparity inter-view motion vector candidate (DIVC) in AMVP and Merge modes. Furthermore, DV is used for inter-view residual prediction (IVRP) and view synthesis prediction (VSP).
The conventional approach to the DV derivation is briefly discussed as follows. In the example of HEVC-based 3D coding Test Model version 6.0 (3DV-HTM 6.0), the DV derivation process includes the following ordered steps:
1. Derive the neighboring block disparity vector (NBDV) associated with a reference view index.
2. Derive the depth-oriented NBDV (DoNBDV) by using the derived NBDV and depth map.
The DV derivation based on 3DV-HTM 6.0 may encounter problems in certain conditions. For example, the view index of the depth map used for DoNBDV derivation is always 0 according to 3DV-HTM. In other words, the depth map is always in the base view. However, there is no restriction for the view index of NBDV. The reference view index of NBDV and the view index of depth map may be different according to 3DV-HTM.
After the DV (NBDV or DoNBDV) is derived, the derived DV may be used for TIVC, DIVC, IVRP and VSP. The problems that the system may encounter are described as follow.                For DoNBDV used for TIVC in AMVP and Merge modes:                    The reference view selected for TIVC is always the smallest view ID included in the reference list according to 3DV-HTM 6.0, which may be different from the reference view of the DoNBDV.                        For DoNBDV used for DIVC in AMVP and Merge mode:                    The selected reference view of DIVC, which is the reference picture, according to conventional 3D-HEVC may be different from the reference view of the DoNBDV.                        For NBDV used for IVRP:                    The selected reference view of IVRP, which is the base view in current test model, may be different from the reference view of the NBDV.                        For NBDV used for VSP:                    The VSP process always converts the depth value to a DV with view index 0. The view index of the converted DV may be different from the reference view of VSP.                        
Derivation process of the neighboring block disparity vector (NBDV) is described as follows. The DV derivation is based on the neighboring blocks of the current block, including spatial neighboring blocks as shown in FIG. 1A and temporal neighboring blocks as shown in FIG. 1B. The spatial neighboring block set includes the location diagonally across from the lower-left corner of the current block (i.e., A0), the location next to the left-bottom side of the current block (i.e., A1), the location diagonally across from the upper-left corner of the current block (i.e., B2), the location diagonally across from the upper-right corner of the current block (i.e., B0), and the location next to the top-right side of the current block (i.e., B1). As shown in FIG. 1B, the temporal neighboring block set includes the location at the center of the current block (i.e., BCTR) and the location diagonally across from the lower-right corner of the current block (i.e., RB) in a temporal reference picture. Temporal block BCTR may be used only if the DV is not available from temporal block RB. The neighboring block configuration illustrates an example that spatial and temporal neighboring blocks may be used to derive NBDV. Other spatial and temporal neighboring blocks may also be used to derive NBDV. For example, for the temporal neighboring set, other locations (e.g., a lower-right block) within the current block in the temporal reference picture may also be used instead of the center location. In other words, any block collocated with the current block can be included in the temporal block set. Once a block is identified as having a DV, the checking process will be terminated. An exemplary search order for the spatial neighboring blocks in FIG. 1A may be (A1, B1, B0, A0, B2). An exemplary search order for the temporal neighboring blocks for the temporal neighboring blocks in FIG. 1B is (BR, BCTR). The spatial and temporal neighboring sets may be different for different modes or different coding standards. In the current disclosure, NBDV may refer to the DV derived based on the NBDV process. When there is no ambiguity, NBDV may also refer to the NBDV process.
A method to enhance the NBDV by extracting a more accurate disparity vector (referred to as a Depth-oriented NBDV (DoNBDV) in this disclosure) from the depth map is utilized in current 3D-HEVC. A depth block from coded depth map in the same access unit is first retrieved and used as a virtual depth of the current block. This coding tool for DV derivation is termed as DoNBDV derivation. While coding the texture in view 1 with the common test condition, the depth map in view 0 is already coded and available. Therefore, the coding of texture in view 1 can be benefited from the depth map in view 0. An estimated disparity vector can be extracted from the virtual depth shown in FIG. 2. The overall flow is as following:                1. Use the derived NBDV 240 for the current block 210 to locate the corresponding block 230 in the coded texture view.        2. Use the collocated depth 230′ in the coded view (i.e., base view according to conventional 3D-HEVC) for current block (coding unit) as virtual depth 250.        3. Extract the maximum value in the virtual depth retrieved in the previous step, and convert it to a disparity vector which is named as DoNBDV.        
Inter-View Residual Prediction (IVRP) is another coding tool used in 3D-HTM. To share the previously coded residual information (i.e., temporal residual information) of adjacent views, the residual signal of the current prediction block (i.e., PU) can be predicted by the residual signals of the corresponding blocks in the inter-view pictures. The corresponding blocks can be located by respective DVs. According to the existing 3D-HEVC, the DV is derived using NBDV and the previously coded residual information is always associated with the base view (i.e., view index 0). Inter-View Residual Prediction (IVRP) is also named as Advanced Residual Prediction (ARP).
View synthesis prediction (VSP) is a technique to remove interview redundancies among video signal from different viewpoints, in which synthetic signal is used as references to predict a current picture. In 3D-HEVC Test Model, NBDV is used to derive a disparity vector. The derived disparity vector is then used to fetch a depth block in the depth map of the reference view. According to the existing 3D-HEVC, the depth map used is always associated with the base view (i.e., view index 0). The fetched depth block has the same size of the current prediction unit (PU). A maximum depth value is determined from the depth block and the maximum value is converted to a DV. The converted DV will then be used to perform backward warping for the current PU. In addition, the warping operation may be performed at a sub-PU level precision, such as 8×4 or 4×8 blocks. In this case, a maximum depth value is picked for a sub-PU block and used for warping all the pixels in the sub-PU block. The VSP technique is applied for texture picture coding. In current implementation, VSP is also added as a new merging candidate to signal the use of VSP prediction. In such a way, a VSP block may be coded as a skipped block without sending any residual, or a Merge block with residual information coded.
As mentioned before, DV is also used as a temporal inter-view motion vector candidate (TIVC) in advanced motion vector prediction (AMVP) and Merge modes. The process of the inter-view prediction of motion parameters is illustrated in FIG. 3. For deriving the motion parameters of temporal inter-view motion vector candidate (TIVC) for a current PU in a dependent view, a disparity vector 330 is derived for the current PU 310 in a dependent view. By adding the DV to the location of the current PU, a reference sample location 320 is obtained in the inter-view reference view. The prediction block in the already coded picture in the reference view that covers the sample location is used as the reference block. If this reference block is coded using motion compensated prediction (MCP), the associated motion parameters (e.g., MV 322) can be used as the TIVC (e.g., MV 312) for the current PU in the current view. TIVC may also be applied to blocks at sub-PU level. As mentioned before, the reference picture for TIVC is determined according to the smallest view ID in the reference list. On the other hand, the DV is derived according to DoNBDV. Therefore, the reference view associated with the view index may not be the same as the reference view associated with the DV derived from DoNBDV. In this disclosure, TIVC may refer to the MV candidate for TIVC. However, when there is no ambiguity concern, TIVC may also refer to the TIVC process.
The DV can also be used as a Merge candidate for disparity compensated prediction (DCP), which is called the disparity inter-view motion vector candidate (DIVC). DIVC includes a current DV and an associated reference view. The current DV is set equal to the DoNBDV. However, the reference view according to the conventional 3D-HEVC is set to the first reference picture (in terms of reference index) of current slice that has the same POC (Picture order Count) as that of the current slice. Therefore, the situation may rise that the reference picture may have a different view index from the DoNBDV. Therefore, there will be a problem to use the DV equal to DoNBDV in this case. In this disclosure, DIVC may refer to the inter-view DV candidate. However, when there is no ambiguity concern, DIVC may also refer to the DIVC process.
The above mentioned problems are not present in the comment test condition used to evaluate the performance of 3D-HEVC coding system since inter-view reference pictures other than the base view reference picture are not allowed. However, it would become a problem when the 3D coding tools are allowed to relax the constraint. Accordingly, it is desirable to develop DV derivation method that can be free of the problems mentioned above.