Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3D TV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. Accordingly, the multiple cameras will capture multiple video sequences corresponding to multiple views. In order to provide more views, more cameras have been used to generate multi-view video with a large number of video sequences associated with the views. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space or the transmission bandwidth.
A straightforward approach may be to simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such coding system would be very inefficient. In order to improve efficiency of multi-view video coding, multi-view video coding exploits inter-view redundancy. Various 3D coding tools have been developed or being developed by extending existing video coding standard. For example, there are standard development activities to extend H.264/AVC (advanced video coding) and HEVC (high efficiency video coding) to multi-view video coding (MVC) and 3D coding. The corresponding new standards being developed are referred as 3D-HEVC (High Efficiency Video Coding) or 3D-AVC (Advanced Video Coding) coding respectively. Various 3D coding tools developed or being developed for 3D-HEVC and 3D-AVC are reviewed as follows.
To share the previously coded texture information of adjacent views, a technique known as Disparity-Compensated Prediction (DCP) has been included in 3D-HTM (test Model for three-dimensional video coding based on HEVC (High Efficiency Video Coding)) as an alternative coding tool to motion-compensated prediction (MCP). MCP refers to an inter-picture prediction that uses previously coded pictures of the same view, while DCP refers to an inter-picture prediction that uses previously coded pictures of other views in the same access unit. FIG. 1 illustrates an example of 3D video coding system incorporating MCP and DCP. The vector (110) used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP. FIG. 1 illustrates three MVs (120, 130 and 140) associated with MCP. Moreover, the DV of a DCP block can also be predicted by the disparity vector predictor (DVP) candidate derived from neighboring blocks or the temporal collocated blocks that also use inter-view reference pictures. In current 3D-HTM, when deriving an inter-view Merge candidate for Merge/Skip modes, if the motion information of corresponding block is not available or not valid, the inter-view Merge candidate is replaced by a DV.
Inter-view motion prediction is used to share the previously encoded motion information of reference views. For deriving candidate motion parameters for a current block in a dependent view, a DV for current block is derived first, and then the prediction block in the already coded picture in the reference view is located by adding the DV to the location of current block. If the prediction block is coded using MCP, the associated motion parameters can be used as candidate motion parameters for the current block in the current view. The derived DV can also be directly used as a candidate DV for DCP.
Inter-view residual prediction is another coding tool used in 3D-HTM. To share the previously coded residual information of adjacent views, the residual signal of the current prediction block (i.e., PU) can be predicted by the residual signals of the corresponding blocks in the inter-view pictures. The corresponding blocks can be located by respective DVs. The video pictures and depth maps corresponding to a particular camera position are indicated by a view identifier (i.e., V0, V1 and V2). All video pictures and depth maps that belong to the same camera position are associated with the same viewId (i.e., view identifier). The view identifiers are used for specifying the coding order within the access units and detecting missing views in error-prone environments. An access unit includes all video pictures and depth maps corresponding to the same time instant. Inside an access unit, the video picture and, when present, the associated depth map having viewId equal to 0 are coded first, followed by the video picture and depth map having viewId equal to 1, etc. The view with viewId equal to 0 (i.e., V0) is also referred to as the base view or the independent view. The base view video pictures can be coded using a conventional HEVC video coder without dependence on other views.
For the current block, motion vector predictor (MVP)/ disparity vector predictor (DVP) can be derived from the inter-view blocks in the inter-view pictures. In the following, inter-view blocks in inter-view picture may be abbreviated as inter-view blocks. The derived candidate is termed as inter-view candidates, which can be inter-view MVPs or DVPs. The coding tools that codes the motion information of a current block (e.g., a current prediction unit, PU) based on previously coded motion information in other views is termed as inter-view motion parameter prediction. Furthermore, a corresponding block in a neighboring view is termed as an inter-view block and the inter-view block is located using the disparity vector derived from the depth information of current block in current picture.
View Synthesis Prediction (VSP) is a technique to remove inter-view redundancies among video signal from different viewpoints, in which synthetic signal is used as references to predict a current picture. In 3D-HEVC test model, HTM-7.0, there exists a process to derive a disparity vector predictor, known as NBDV (Neighboring Block Disparity Vector). The derived disparity vector is then used to fetch a depth block in the depth image of the reference view. The procedure to derive the virtual depth can be applied for VSP to locate the corresponding depth block in a coded view. The fetched depth block may have the same size of the current prediction unit (PU), and it will then be used to do backward warping for the current PU. In addition, the warping operation may be performed at a sub-PU level precision, such as 2×2 or 4×4 blocks.
In current implementation, VSP is only applied for texture component coding. Also the VSP prediction is added as a new merging candidate to signal the use of VSP prediction. In such a way, a VSP block may be a skipped block without any residual, or a Merge block with residual information coded. The VSP-based merging candidate may also be referred as VSP merging candidate for convenience in this disclosure.
When a picture is coded as B picture and the current block is signaled as VSP predicted, the following steps are applied to determine the prediction direction of VSP:                Obtain the view index refViewIdxNBDV of the derived disparity vector from NBDV;        Obtain the reference picture list RefPicListNBDV (either RefPicList0 or RefPicList1) that is associated with the reference picture with view index refViewIdxNBDV;        Check the availability of an interview reference picture with view index refViewldx that is not equal to refViewIdxNBDV in the reference picture list other than RefPicListNBDV;                    If such a different interview reference picture is found, bi-direction VSP is applied. The depth block from view index refViewIdxNBDV is used as the current block's depth information (in case of texture-first coding order), and the two different interview reference pictures (each from one reference picture list) are accessed via backward warping process and further weighted to achieve the final backward VSP predictor;            Otherwise, uni-direction VSP is applied with RefPicListNBDV as the reference picture list for prediction.                        
When a picture is coded as a P picture and the current prediction block is using VSP, uni-direction VSP is applied.
It is noted that, when adding the VSP Merge candidate, the VSP flag is always set as true no matter if there is an inter-view reference picture with the view index equal to the view index of the inter-view reference picture pointed by the derived DV.
The DV is critical in 3D video coding for inter-view motion prediction, inter-view residual prediction, disparity-compensated prediction (DCP), view synthesis prediction (VSP) or any other tools which need to indicate the correspondence between inter-view pictures. The DV derivation utilized in current test model of 3D-HEVC is described as follow.
DV Derivation in 3D-HEVC. Currently, except for the DV for DCP, the DVs used for the other coding tools are derived using either the scheme of neighboring block disparity vector (NBDV) or the scheme of depth oriented neighboring block disparity vector (DoNBDV) as described below.
Neighboring block disparity vector (NBDV). In the current 3D-HEVC, a disparity vector can be used as a DVP candidate for Inter mode or as a Merge candidate for Merge/Skip mode. A derived disparity vector can also be used as an offset vector for inter-view motion prediction and inter-view residual prediction. When used as an offset vector, the DV is derived from spatial and temporal neighboring blocks as shown in FIGS. 2A-2B. Multiple spatial and temporal neighboring blocks are determined and DV availability of the spatial and temporal neighboring blocks is checked according to a pre-determined order. This coding tool for DV derivation based on neighboring (spatial and temporal) blocks is termed as Neighboring Block DV (NBDV). The temporal neighboring block set, as shown in FIG. 2A, is searched first. The temporal merging candidate set includes the location at the center of the current block (i.e., BCTR) and the location diagonally across from the lower-right corner of the current block (i.e., RB) in a temporal reference picture. The temporal search order starts from RB to BCTR. Once a block is identified as having a DV, the checking process will be terminated. The spatial neighboring block set includes the location diagonally across from the lower-left corner of the current block (i.e., A0), the location next to the left-bottom side of the current block (i.e., A1), the location diagonally across from the upper-left corner of the current block (i.e., B2), the location diagonally across from the upper-right corner of the current block (i.e., B0), and the location next to the top-right side of the current block (i.e., B1) as shown in FIG. 2B. The search order for the spatial neighboring blocks is (A1, B1, B0, A0, B2).
If a DCP coded block is not found in the neighboring block set (i.e., spatial and temporal neighboring blocks as shown in FIGS. 2A and 2B), the disparity information can be obtained from another coding tool, named DV-MCP. In this case, when a spatial neighboring block is MCP coded block and its motion is predicted by the inter-view motion prediction, as shown in FIG. 3, the disparity vector used for the inter-view motion prediction represents a motion correspondence between the current and the inter-view reference picture. This type of motion vector is referred to as inter-view predicted motion vector and the blocks are referred to as DV-MCP blocks. FIG. 3 illustrates an example of a DV-MCP block, where the motion information of the DV-MCP block (310) is predicted from a corresponding block (320) in the inter-view reference picture. The location of the corresponding block (320) is specified by a disparity vector (330). The disparity vector used in the DV-MCP block represents a motion correspondence between the current and inter-view reference picture. The motion information (322) of the corresponding block (320) is used to predict motion information (312) of the current block (310) in the current view.
To indicate whether a MCP block is DV-MCP coded and to store the disparity vector for the inter-view motion parameters prediction, two variables are used to represent the motion vector information for each block:                dvMcpFlag, and        dvMcpDisparity.        
When dvMcpFlag is equal to 1, the dvMcpDisparity is set to indicate that the disparity vector is used for the inter-view motion parameter prediction. In the construction process for the AMVP mode and Merge candidate list, the dvMcpFlag of the candidate is set to 1 if the candidate is generated by inter-view motion parameter prediction and is set to 0 otherwise. If neither DCP coded blocks nor DV-MCP coded blocks are found in the above mentioned spatial and temporal neighboring blocks, then a zero vector can be used as a default disparity vector.
Depth Oriented Neighboring Block Disparity Vector (DoNBDV). A method to enhance the NBDV by extracting a more accurate disparity vector from the depth map is utilized in current 3D-HEVC. A depth block from coded depth map in the same access unit is first retrieved and used as a virtual depth of the current block. To be specific, the refined DV is converted from the maximum disparity of the pixel subset in the virtual depth block which is located by the DV derived using NBDV. This coding tool for DV derivation is termed as Depth-oriented NBDV (DoNBDV).
In HEVC, two different modes for signaling the motion parameters for a block are specified. In the first mode, which is referred to as adaptive motion vector prediction (AMVP) mode, the number of motion hypotheses, the reference indices, the motion vector differences, and indications specifying the used motion vector predictors are coded in the bitstream. The second mode is referred to as Merge mode. For this mode, only an indication is coded, which signals the set of motion parameters that are used for the block. In the current 3D-HEVC, during the process of collecting motion hypotheses for AMVP, if the reference picture type of spatial neighbor is the same as the reference picture type of current PU (inter-view or temporal) and the picture order count (POC) of the reference picture of spatial neighbor is equal to the POC of the reference picture of the current PU, the motion information of spatial neighbor is directly used as the motion hypothesis of the current PU.
In the conventional scheme, the inter-view reference picture pointed by the derived DV may not be included in the reference picture lists of the current PU. Therefore, while the VSP mode may still be selected (i.e., VSP flag could be set as true), however, the VSP process cannot be performed if the inter-view reference picture pointed by the derived DV may not be included in the reference picture lists of the current PU. In this case, the VSP mode does not have any effective motion information if the VSP mode does get selected. As a result, a mismatch between encoder and decoder will occur.
Furthermore, in the conventional 3D-HEVC, the Neighboring Block Disparity Vector (NBDV) derivation process checks the availability of disparity vector (DV) associated with spatial and temporal neighboring blocks. If no DV can be derived from the neighboring blocks, a default DV with a zero-valued vector pointing to the base view (with a view index equal to 0) is used. The DV derived by NBDV can be further used by the process of depth-oriented NBDV (DoNBDV) to derive a refined DV. An example of disparity vector derivation process of NBDV (steps 1-2) and DoNBDV (step 3) according to HTM-8.0 is illustrated as follows.                1. The disparity vector (DV) is set to (0, 0) initially.        2. The NBDV derivation is performed as follows.                    a) Search the temporal neighboring blocks to determine if the disparity vector can be found in these temporal neighbouring blocks. Once a DV is found, the DV found is used as the output of the NBDV process and the process is terminated. In HTM-8.0, two temporal neighboring blocks are used, including a co-located block in co-located picture and a co-located block in RAP (Random Access Point) picture, where the two co-located blocks correspond to a central block in the co-located picture and the RAP picture respectively as shown in FIG. 4A.            b) Search the spatial neighbouring blocks (i.e., blocks A1 and B1 as shown in FIG. 4B) to determine if a disparity vector can be found in these spatial neighbouring blocks. Once a DV is found, the DV found is used as the output of the NBDV process and the process is terminated.            c) Search the spatial neighbouring blocks (i.e., blocks A1 and B1 as shown in FIG. 4B) to determine if an intrinsic disparity vector can be found in these spatial neighbouring blocks. The intrinsic disparity vector is the disparity information obtained from spatial neighboring DV-MCP blocks whose motion is predicted from a corresponding block in the inter-view reference picture where the location of the corresponding blocks is specified by a disparity vector as shown in FIG. 3. The disparity vector used in the DV-MCP block represents a motion correspondence between the current and inter-view reference pictures. Once an intrinsic DV is found, the found DV is used as the output of the NBDV process and the process is terminated.            d) If there is still no DV found, a zero vector with a zero view index is used as a default output for the NBDV process.                        3. If a flag (i.e., depth_refinement_flag) indicating whether NBDV is further refined from the depth map, is equal to 1, then a refined NBDV, DVref is derived as follows.                    a) Find the corresponding depth block of the reference view by using NBDV,            b) Select the representative depth value in the corresponding depth block, and            c) Convert the representative depth value to the disparity vector.                        
In current 3D-HEVC, NBDV is used to derive a DV from the spatial or temporal neighboring blocks based on a predefined order. When no DV can be derived from the neighboring blocks, a default DV with a zero vector pointing to the base view (with a view index equal to 0) is used. However, there may be cases that the base view reference picture is not included in the reference picture list of a current image unit (e.g., a slice or a largest coding unit). Under this condition, the default DV may point to a non-existing reference picture and this may cause mismatch between an encoder and decoder due to this invalid view index.