Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3D TV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. Accordingly, the multiple cameras will capture multiple video sequences corresponding to multiple views. In order to provide more views, more cameras have been used to generate multi-view video with a large number of video sequences associated with the views. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space or the transmission bandwidth.
A straightforward approach may be to simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such coding system would be very inefficient. In order to improve efficiency of multi-view video coding, multi-view video coding exploits inter-view redundancy. Various 3D coding tools have been developed or being developed by extending existing video coding standard. For example, there are standard development activities to extend H.264/AVC (advanced video coding) and HEVC (high efficiency video coding) to multi-view video coding (MVC) and 3D coding.
Various 3D coding tools developed or being developed for 3D-HEVC and 3D-AVC are reviewed as follows.
To share the previously coded texture information of adjacent views, a technique known as Disparity-Compensated Prediction (DCP) has been included in 3D-HTM as an alternative coding tool to motion-compensated prediction (MCP). MCP refers to an inter-picture prediction that uses previously coded pictures of the same view, while DCP refers to an inter-picture prediction that uses previously coded pictures of other views in the same access unit. FIG. 1 illustrates an example of 3D video coding system incorporating MCP and DCP. The vector (110) used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP. FIG. 1 illustrates three MVs (120, 130 and 140) associated with MCP. Moreover, the DV of a DCP block can also be predicted by the disparity vector predictor (DVP) candidate derived from neighboring blocks or the temporal collocated blocks that also use inter-view reference pictures. In current 3D-HTM, when deriving an inter-view Merge candidate for Merge/Skip modes, if the motion information of corresponding block is not available or not valid, the inter-view Merge candidate is replaced by a DV.
Inter-view residual prediction is another coding tool used in 3D-HTM. To share the previously coded residual information of adjacent views, the residual signal of the current prediction block (i.e., PU) can be predicted by the residual signals of the corresponding blocks in the inter-view pictures as shown in FIG. 2. The corresponding blocks can be located by respective DVs. The video pictures and depth maps corresponding to a particular camera position are indicated by a view identifier (i.e., V0, V1 and V2 in FIG. 2). All video pictures and depth maps that belong to the same camera position are associated with the same viewIdx (i.e., view order index). The view order indices are used for specifying the coding order within the access units and detecting missing views in error-prone environments. An access unit includes all video pictures and depth maps corresponding to the same time instant. Inside an access unit, the video picture and, when present, the associated depth map having viewIdx equal to 0 are coded first, followed by the video picture and depth map having viewIdx equal to 1, etc. The view with viewIdx equal to 0 (i.e., V0 in FIG. 2) is also referred to as the base view or the independent view. The base view video pictures can be coded using a conventional HEVC video coder without dependence on other views.
As can be seen in FIG. 2, for the current block, motion vector predictor (MVP)/disparity vector predictor (DVP) can be derived from the inter-view blocks in the inter-view pictures. In the following, inter-view blocks in inter-view picture may be abbreviated as inter-view blocks. The derived candidate is termed as inter-view candidates, which can be inter-view MVPs or DVPs. The coding tools that codes the motion information of a current block (e.g., a current prediction unit, PU) based on previously coded motion information in other views is termed as inter-view motion parameter prediction. Furthermore, a corresponding block in a neighboring view is termed as an inter-view block and the inter-view block is located using the disparity vector derived from the depth information of current block in current picture.
The example shown in FIG. 2 corresponds to a view coding order from V0 (i.e., base view) to V1, and followed by V2. The current block in the current picture being coded is in V2. According to HTM3.1, all the MVs of reference blocks in the previously coded views can be considered as an inter-view candidate even if the inter-view pictures are not in the reference picture list of current picture. In FIG. 2, frames 210, 220 and 230 correspond to a video picture or a depth map from views V0, V1 and V2 at time t1 respectively. Block 232 is the current block in the current view, and blocks 212 and 222 are the collocated current blocks in V0 and V1 respectively. For the collocated current block 212 in V0, a disparity vector (216) is used to locate the inter-view collocated block (214). Similarly, for the collocated current block 222 in V1, a disparity vector (226) is used to locate the inter-view collocated block (224).
In a 3D system, the depth map in a reference view may be coded before texture pictures in dependent views. Therefore, the coded depth information becomes useful for subsequent texture and depth coding. For example, the processing order for texture and depth components may be T0, D0, T1, T2, D1 and D2 for a system having V0, V1 and V2, where “T” refers to texture and “D” refers to depth. The texture picture in the base view (i.e., V0) is coded first followed by the depth map in V0. For dependent views, the texture pictures are coded first followed by depth maps. Therefore, the coded depth map in view 0 can be used to derive the DV for the texture frame in view 1 to be coded. FIG. 3 illustrates an example of a technique of converting depth to disparity as used by virtual depth. A predicted disparity vector (340) is determined for the current block (CB, 310). An inter-view reference texture block (350) in the reference view is located from the collocated location (310′) of the current block (CB, 310) by using the predicted disparity vector (340). The corresponding depth block (330) in the coded D0 collocated with the inter-view reference texture block (350) is retrieved for the current block (CB, 310). The retrieved block (330) is then used as the virtual depth block (330′) for the current block to derive the DV. The depth values associated with the virtual depth block 330′ is then used and converted to disparity. For example, the maximum value in the virtual depth block (330′) can be used to convert into a disparity vector for various inter-view coding tools. In the current 3D-HEVC, the disparity vectors (DVs) used for disparity compensated prediction (DCP) are explicitly transmitted or implicitly derived in a way similar to motion vectors (MVs) with respect to AMVP (advanced motion vector prediction) and merging operations. Currently, except for the DV for DCP, the DVs used for the other coding tools are derived using either the neighboring block disparity vector (NBDV) process or the depth oriented neighboring block disparity (DoNBDV) process as described below.
In the current 3D-HEVC, a disparity vector can be used as a DVP candidate for Inter mode or as a Merge candidate for Merge/Skip mode. A derived disparity vector can also be used as an offset vector for inter-view motion prediction and inter-view residual prediction. When used as an offset vector, the DV is derived from spatial and temporal neighboring blocks as shown in FIG. 4. Multiple spatial and temporal neighboring blocks are determined and DV availability of the spatial and temporal neighboring blocks is checked according to a pre-determined order. This coding tool for DV derivation based on neighboring (spatial and temporal) blocks is termed as Neighboring Block DV (NBDV). As shown in FIG. 4A, the spatial neighboring block set includes the location diagonally across from the lower-left corner of the current block (i.e., A0), the location next to the left-bottom side of the current block (i.e., A1), the location diagonally across from the upper-left corner of the current block (i.e., B2), the location diagonally across from the upper-right corner of the current block (i.e., B0), and the location next to the top-right side of the current block (i.e., B1). As shown in FIG. 4B, the temporal neighboring block set includes the location at the center of the current block (i.e., BCTR) and the location diagonally across from the lower-right corner of the current block (i.e., RB) in a temporal reference picture. Instead of the center location, other locations (e.g., a lower-right block) within the current block in the temporal reference picture may also be used. In other words, any block collocated with the current block can be included in the temporal block set. Once a block is identified as having a DV, the checking process will be terminated. An exemplary search order for the spatial neighboring blocks in FIG. 4A is (A1, B1, B0, A0, B2). An exemplary search order for the temporal neighboring blocks for the temporal neighboring blocks in FIG. 4B is (BR, BCTR). In the current practice, two collocated pictures will be checked.
If a DCP coded block is not found in the neighboring block set (i.e., spatial and temporal neighboring blocks as shown in FIGS. 4A and 4B), the disparity information can be obtained from another coding tool, named DV-MCP. In this case, when a spatial neighboring block is MCP coded block and its motion is predicted by the inter-view motion prediction, as shown in FIG. 5, the disparity vector used for the inter-view motion prediction represents a motion correspondence between the current and the inter-view reference picture. This type of motion vector is referred to as inter-view predicted motion vector and the blocks are referred to as DV-MCP blocks. FIG. 5 illustrates an example of a DV-MCP block, where the motion information of the DV-MCP block (510) is predicted from a corresponding block (520) in the inter-view reference picture. The location of the corresponding block (520) is specified by a disparity vector (530). The disparity vector used in the DV-MCP block represents a motion correspondence between the current and inter-view reference picture. The motion information (522) of the corresponding block (520) is used to predict motion information (512) of the current block (510) in the current view.
To indicate whether a MCP block is DV-MCP coded and to store the disparity vector for the inter-view motion parameters prediction, two variables are used to represent the motion vector information for each block:                dvMcpFlag, and        dvMcpDisparity.        
When dvMcpFlag is equal to 1, the dvMcpDisparity is set to indicate that the disparity vector is used for the inter-view motion parameter prediction. In the construction process for the AMVP mode and Merge candidate list, the dvMcpFlag of the candidate is set to 1 if the candidate is generated by inter-view motion parameter prediction and is set to 0 otherwise. If neither DCP coded blocks nor DV-MCP coded blocks are found in the above mentioned spatial and temporal neighboring blocks, then a zero vector can be used as a default disparity vector.
A method to enhance the NBDV by extracting a more accurate disparity vector (referred to as a refined DV in this disclosure) from the depth map is utilized in current 3D-HEVC. A depth block from coded depth map in the same access unit is first retrieved and used as a virtual depth of the current block. To be specific, the refined DV is converted from the maximum disparity of the pixel subset in the virtual depth block which is located by the DV derived using NBDV. This coding tool for DV derivation is termed as Depth-oriented NBDV (DoNBDV). Again, a zero vector could be used as a default DV if no refined DV could be derived by the DoNBDV. An estimated disparity vector can be extracted from the virtual depth shown in FIG. 5. The overall flow is as following:                1. Use an estimated disparity vector, which is the NBDV in current 3D-HTM, to locate the corresponding block in the coded texture view        2. Use the corresponding depth in the coded view for current block (coding unit) as virtual depth.        3. Extract a disparity vector (i.e., a refined DV) for inter-view motion prediction from the maximum value in the virtual depth retrieved in the previous step.        
View synthesis prediction (VSP) is a technique to remove inter-view redundancies among video signal from different viewpoints, in which synthetic signal is used as references to predict a current picture. In 3D-HEVC test model, HTM-7.0, there exists a process to derive a disparity vector predictor, known as NBDV (Neighboring Block Disparity Vector). The derived disparity vector is then used to fetch a depth block in the depth image of the reference view. The procedure to derive the virtual depth as shown in FIG. 3 can be applied for VSP to locate the corresponding depth block in a coded view. The fetched depth block may have the same size of the current prediction unit (PU), and it will then be used to do backward warping for the current PU. In addition, the warping operation may be performed at a sub-PU level precision, such as 2×2 or 4×4 blocks as shown in FIG. 6.
In FIG. 6, a current texture block (610) in view 1 is to be processed. A predicted disparity vector (640) is used to locate an inter-view reference texture block 650 from the collocated location (610′) of the current block. The collocated depth block (630) in the coded view corresponding to texture block 650 can be identified. The coded depth block (630) is then used as a virtual depth block (630′) for the current block to perform backward warping. The current block (610) is divided into four sub-blocks. The virtual depth block is also divided into four sub-blocks. A maximum depth value may be selected for each sub-PU block to convert into a disparity vector for the sub-block. Therefore, 4 converted disparity vectors are obtained as shown as 4 arrows in FIG. 6. The four disparity vectors are used for backward warping all the pixels in the sub-PU blocks. The synthesized sub-blocks are then used for prediction of the current block. Currently a horizontal disparity vector is converted from the selected depth value. The backward VSP (BVSP) technique is applied to texture component coding.
In current implementation, BVSP is added as a new merging candidate to signal the use of BVSP prediction. In such a way, a BVSP block may be a skipped block without any residual, or a Merge block with residual information coded.
As described above, coding tools such as DoNBDV and VSP convert the depth values to one or more disparity vectors (DVs) for prediction. Such depth-oriented coding tools need the camera parameters for depth to disparity conversion. For example, the disparity value, D can be converted from the depth using a linear function of the depth value, d:
                    D        =                  f          ·          l          ·                                    (                                                                    d                                          (                                                                        2                          BitDepth                                                -                        1                                            )                                                        ⁢                                      (                                                                  1                                                  Z                          near                                                                    -                                              1                                                  Z                          far                                                                                      )                                                  +                                  1                                      Z                    far                                                              )                        .                                              (        1        )            
The above conversion requires camera parameters Zfar, Znear, focal length f, and translation l, and data precision, BitDepth for the depth data. The above conversion can be simplified to:D=(d*DisparityScale+DisparityOffset<<BitDepth)+(1<<(log 2 Div−1)))>>log 2 Div,  (2)
where DisparityScale is a scaling factor, DisparityOffset is an offset value, BitDepth is equal to 8 for typical depth data and log 2 Div is a shift parameter that depends on the required accuracy of the disparity vectors. The simplified conversion according to equation (2) uses arithmetic shifts instead of a division operation.
Following the wording of scalable video coding which codes a base layer with enhancement layers to improve the video scalability, the 3D video coding systems separate each texture/depth sequence in different views to each different “layer”. Each layer has a layer identifier “LayerId”. In HTM-7.0 (3D-HEVC based test Model version 7.0), the camera parameters are coded only when a layer is a non-depth layer. If only depth layers are coded or depth layers are coded before texture layers, the camera parameters will not be available for the depth oriented coding tools. Furthermore, in HTM-7.0, camera parameters are sent in the sequence parameter set (SPS) which only records the information of a single layer without knowing the relationship between different layers. The information to distinguish depth layers and non-depth layers is stored in VPS (video parameter set) when the depth flag VpsDepthhFlag is derived from dimension_id, which is only available in the VPS. The VpsDepthFlag[nuh_layer_id] specifies the depth flag of the layer with layer id as nuh_layer_id. Table 1 illustrates the syntax for camera parameters signaled in the SPS according to HTM-7.0. As shown in Table 1, cp_in_slice_header_flag controls whether camera parameters will be in the SPS extension or in slice segment header. If the cp_in_sliceheader_flag is 0, camera parameters (i.e., cp_scale[i], cp_off[i], cp_inv_scale_plus_scale[i] and cp_inv_off_plus_off[i] will be incorporated in the SPS extension. Otherwise, the camera parameters will be incorporated in the slice segment header. In HTM-7.0, redundancy in camera parameters exists between VPS (video parameter set) and slice segment header. Table 2 illustrates the syntax for camera parameters signaled in the slice header according to 3D HEVC
Test Model 3. Also, redundancy in camera parameters exists between the texture and depth layers in the same view. It is desirable to develop techniques to resolve the issues of unavailable camera parameter and redundancy in camera parameters in some situations.
TABLE 1Descriptorsps_extension2( ) {if( !VpsDepthFlag[ nuh_layer_id ] ) {cp_precisionue(v)cp_in_slice_header_flagu(1)if( !cp_in_slice_header_flag) {for ( i = 0; i < ViewId[ nuh_layer_id ]; i++ ){cp_scale[ i ]se(v)cp_off[ i ]se(v)cp_inv_scale_plus_scale[ i ]se(v)cp_inv_off_plus_off[ i ]se(v)}}}}
TABLE 2Descriptorslice_header_extension( ) {if( cp_in_slice_header_flag ) {for ( i = 0; i < ViewIdx; i++ ) {cp_scale[ i ]se(v)cp_off[ i ]se(v)cp_inv_scale_plus_scale[ i ]se(v)cp_inv_off_plus_off[ i ]se(v)}}}