Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the 3D video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
To reduce the inter-view redundancy, disparity-compensated prediction (DCP) has been used as an alternative to motion-compensated prediction (MCP). MCP refers to an inter-picture prediction that uses already coded pictures of the same view in a different access unit, while DCP refers to inter-picture prediction that uses already coded pictures of other views in the same access unit, as illustrated in FIG. 1. The three-dimensional/multi-view data consists of texture pictures (110) and depth maps (120). The motion compensated prediction is applied to texture pictures or depth maps in the temporal direction (i.e., the horizontal direction in FIG. 1). The disparity compensated prediction is applied to texture pictures or depth maps in the view direction (i.e., the vertical direction in FIG. 1). The vector used for DCP is termed disparity vector (DV), which is analog to the motion vector (MV) used in MCP.
3D-HEVC is an extension of HEVC (High Efficiency Video Coding) that is being developed for encoding/decoding 3D video. One of the views is referred to as the base view or the independent view. The base view is coded independently of the other views as well as the depth data. Furthermore, the base view is coded using a conventional HEVC video coder.
In 3D-HEVC, a hybrid block-based motion-compensated DCT-like transform coding architecture is still utilized. The basic unit for compression, termed coding unit (CU), is a 2N×2N square block, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs). The PU size can be 2N×2N, 2N×N, N×2N, or N×N. When asymmetric motion partition (AMP) is supported, the PU size can also be 2N×nU, 2N×nD, nL×2N and nR×2N.
The 3D video is typically created by capturing a scene using video camera with an associated device to capture depth information or using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The texture data and the depth data corresponding to a scene usually exhibit substantial correlation. Therefore, the depth information can be used to improve coding efficiency or reduce processing complexity for texture data, and vice versa. For example, the corresponding depth block of a texture block reveals similar information corresponding to the pixel level object segmentation. Therefore, the depth information can help to realize pixel-level segment-based motion compensation. Accordingly, a depth-based block partitioning (DBBP) has been adopted for texture video coding in the current 3D-HEVC (3D video coding based on the High Efficiency Video Coding (HEVC) standard).
In the depth-based block partitioning (DBBP) mode, arbitrarily shaped block partitioning for the collocated texture block is derived based on a binary segmentation mask computed from the corresponding depth map. Each of the two partitions (resembling foreground and background) is motion compensated and merged afterwards based on the depth-based segmentation mask.
A single flag is added to the coding syntax to signal to the decoder that the underlying block uses DBBP for prediction. When current coding unit is coded with the DBBP mode, the corresponding partition size is set to SIZE_2N×2N and bi-prediction is inherited.
A disparity vector derived from the DoNBDV (Depth-oriented Neighboring Block Disparity Vector) process is applied to identify a corresponding depth block in a reference view as shown in FIG. 2. In FIG. 2, corresponding depth block 220 in a reference view for current texture block 210 in a dependent view is located based on the location of the current texture block and derived DV 212, which is derived using DoNBDV according to 3D-HEVC standard. The corresponding depth block has the same size as current texture block. When the depth block is found, a threshold is calculated based on the average of all depth pixels within the corresponding depth block. Afterwards, a binary segmentation mask m_D (x,y) is generated based on depth values and the threshold. When the depth value located at the relative coordinator (x, y) is larger than the threshold, the binary mask m_D (x,y) is set to 1. Otherwise, m_D (x,y) is set to 0. An example is shown in FIG. 3. The mean value of the virtual block (310) is determined in step 320. The values of virtual depth samples are compared to the mean depth value in step 330 to generate segmentation mask 340. The segmentation mask is represented in binary data to indicate whether an underlying pixel belongs to segment 1 or segment 2, as indicated by two different line patterns in FIG. 3
The DoNBDV process enhances the NBDV by extracting a more accurate disparity vector from the depth map. The NBDV is derived based on disparity vector from neighboring blocks. The disparity vector derived from the NBDV process is used to access depth data in a reference view. A final disparity vector is then derived from the depth data.
The DBBP process partitions the 2N×2N block into two partitioned block. A motion vector is determined for each partition block. In the decoding process, each of the two decoded motion parameters is used for motion compensation performed on a whole 2N×2N block. The resulting prediction signals, i.e., p_T0 (x,y) and p_T1 (x,y) are combined using the DBBP mask m_D (x,y), as depicted in FIG. 4. The combination process is defined as follows
                              p_T          ⁢                      (                          x              ,              y                        )                          =                  {                                                                                                                p_T                      ⁢                                                                                          ⁢                      0                      ⁢                                              (                                                  x                          ,                          y                                                )                                                              ,                                                                                                              if                      ⁢                                                                                          ⁢                      m_D                      ⁢                                              (                                                  x                          ,                          y                                                )                                                              =                    1                                                                                                                                          p_T                      ⁢                                                                                          ⁢                      1                      ⁢                                              (                                                  x                          ,                          y                                                )                                                              ,                                                                    otherwise                                                      .                                              (        4        )            
In FIG. 4, the two prediction blocks are merged into one on a pixel by pixel basis according to the segmentation mask and this process is referred as bi-segment compensation. In this example, the N×2N block partition type is selected and two corresponding motion vectors (MV1 and MV2) are derived for two partitioned blocks respectively. Each of the motion vectors is used to compensate a whole texture block (410). Accordingly, motion vector MV1 is applied to texture block 420 to generate prediction block 430 according to motion vector MV1, and motion vector MV2 is applied to texture block 420 also to generate prediction block 432 according to motion vector MV2. The two prediction blocks are merged by applying respective segmentation masks (440 and 442) to generate the final prediction block (450).
Whether the DBBP mode is used is signaled for a coding unit as shown in Table 1A according to the current 3D-HEVC specification (Gerhard Tech et al, 3D-HEVC Draft Text 3, Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 7th Meeting: San Jose, USA, 9 Jan.-17 Jan. 2014, Document: JCT3V-G1001-v1). In currently design, the DBBP flag is conditionally signaled depending on a transmitted partition mode. The flag is signaled only when the transmitted PartMode equals to the 2N×N partition.
TABLE 1ADe-coding_unit( x0, y0, log2CbSize , ctDepth) {scriptorNote ...  if( ( CuPredMode[ x0 ][ y0 ] != MODE_INTRA ||(1-1)   log2CbSize = = MinCbLog2SizeY ) &&   !predPartModeFlag )   part_modeae(v)(1-2)  if( depth_based_blk_part_flag[ nuh_layer_id ](1-3)       && PartMode = = PART_2N×N )   dbbp_flag[ x0 ][ y0 ]u(1)(1-4)    ....}
As shown in Table 1A, syntax element part_mode is included as indicated by Note (1-2) when the conditions indicated by Note (1-1) are satisfied. When the conditions indicated by Note (1-3) are satisfied, the DBBP flag (i.e., dbbp_flag[x0][y0]) is included as indicated by Note (1-4). The conditions indicated by Note (1-3) correspond to the DBBP flag being present (i.e., depth_based_blk_part+flag[nuh_layer_id]==1) and the partition mode is 2N×N (i.e., PartMode==PART_2N×N). In Table 1A, depth_based_blk_part_flag[layerId] equal to 0 specifies that depth based block partitioning is not used for the layer with nuh_layer_id equal to layerId. depth_based_blk_part_flag[layerId] equal to 1 specifies that depth based block partitioning may be used for the layer with nuh_layer_id equal to layerId. When not present, the value of depth_based_blk_part_flag[layerId] is inferred to be equal to 0. At the decoder side, the DBBP flag (i.e., dbbp_flag[x0][y0]) is parsed. Then, depending on the value of the DBBP flag, the DBBP decoding process will be applied to the current coding unit conditionally. If the DBBP flag indicates the current coding unit being DBBP coded, the DBBP decoding processing is then applied to the current coding unit.
In Table 1A, part_mode specifies partitioning mode of the current coding unit (CU) into one or more prediction units (PUs). The semantics of part_mode depend on CuPredMode[x0][y0] (i.e., the prediction mode of the current block). The variable PartMode is derived from the value of part_mode. In Table 1A, the variable predPartModeFlag specifies whether part_mode is predicted by inter-component prediction. Therefore, the condition “log 2 CbSize==MinCbLog 2 SizeY) && !predPartModeFlag” corresponds to “the current CU is the smallest CU and part_mode is not predicted by inter-component prediction”. At the decoder side, the syntax element part_mode is parsed. The prediction partition mode (i.e., PartMode) is determined accordingly. The coding unit is partitioned into one or more prediction units according to the prediction partition mode. The decoding process is then applied to the one or more prediction units.
In 3D-HEVC, the Segment-wise DC Coding (SDC) approach provides an alternative residual coding method. With SDC, the residual data (one or two constant residual values within one PU) is coded without transform and quantization processes. Whether SDC is used is signalled in the coding unit parameters structure at CU level. The partition size of SDC coded CU is always 2N×2N. SDC can be applied to all depth Intra prediction modes including HEVC Intra prediction modes and Depth Modelling Modes (DMM). For HEVC Intra prediction modes, the entire PU is considered as one segment, while for DMM modes, there are two segments. The syntax for the coding unit level related to DBBP and SDC according to the current specification of 3D-HEVC (Gerhard Tech et al, 3D-HEVC Draft Text 3, Joint Collaborative Team on 3D Video Coding Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 7th Meeting: San Jose, USA, 9 Jan.-17 Jan. 2014, Document: JCT3V-G1001) is shown in Table 1B. The coding unit extension syntax is shown in Table 1C.
TABLE 1BDe-scrip-coding_unit( x0, y0, log2CbSize , ctDepth) {torNote...if( slice_type != I )pred_mode_flagif( ( CuPredMode[ x0 ][ y0 ] != MODE_INTRA| |log2CbSize = = MinCbLog2SizeY )&& !predPartModeFlag )part_modeae(v)if( depth_based_blk_part_flag[ nuh_layer_id ]&& PartMode = =PART_2NxN )dbbp_flag[ x0 ][ y0 ]u(1)if( sdcEnableFlag )sdc_flag[ x0 ][ y0 ]if( CuPredMode[ x0 ][ y0 ] = = MODE_INTRA) {if( PartMode = = PART_2Nx2N &&pcm_enabled_flag &&log2CbSize >=Log2MinIpcmCbSizeY &&log2CbSize <=Log2MaxIpcmCbSizeY )pcm_flag[ x0 ][ y0 ].... }cu_extension( x0, y0 )....}
TABLE 1CDe-scrip-cu_extension( x0 , y0 , log2CbSize ) {torNoteif ( icEnableFlag )ic_flagae(v)if ( rpEnableFlag )iv_res_pred_weight_idxae(v)if( cuDepthDcPresentFlag ) {....ae(v)}}
In Table 1B, pcm_flag[x0][y0] equal to 1 specifies that the pcm_sample( ) syntax structure is present and the transform tree( ) syntax structure is not present in the coding unit at the location (x0, y0). pcm_flag[x0][y0] equal to 0 specifies that pcm_sample( ) syntax structure is not present. When pcm_flag[x0][y0] is not present, it is inferred to be equal to 0. PCM (Pulse Coded Modulation) representation is a video coding mode for 3D-HEVC, where the video data is transmitted without transform and prediction. In other words, when the PCM mode is selected (as indicated by pcm_flag[x0][y0]), the video samples (i.e., pcm_sample( )) are transmitted. The value of pcm_flag[x0+i][y0+j] with i=1 . . . nCbS−1, j=1 . . . nCbS−1 is inferred to be equal to pcm_flag[x0][y0], where nCbS corresponds to the CU width.
In the above table, the variable sdcEnableFlag indicates whether the SDC mode is used and the value of sdcEnableFlag is derived as follows.                If CuPredMode[x0][y0] is equal to MODE_INTER, sdcEnableFlag is set equal to (vps_inter_sdc_flag[nuh_layer_id] && PartMode==PART_2N×2N).        Otherwise, if CuPredMode[x0][y0] is equal to MODE_INTRA, sdcEnableFlag is set equal to (vps_depth_modes_flag[nuh_layer_id] && PartMode[x0][y0]==PART_2N×2N).        Otherwise (if CuPredMode[x0][y0] is equal to MODE_SKIP), sdcEnableFlag is set equal to 0        
In the above table, sdc_flag[x0][y0] equal to 1 specifies that segment-wise DC (SDC) coding of residual blocks is used for the current coding unit. sdc_flag[x0][y0] equal to 0 specifies that segment-wise DC coding of residual blocks is not used for the current coding unit. When not present, the value of sdc_flag[x0][y0] is inferred to be equal to 0.
Before coding, the residual values are mapped to values, which are present in the original, uncompressed depth map by using a Depth Lookup Table (DLT). Consequently, residual values can be coded by signaling only the index into this lookup table, which reduces the bit depth of residual magnitudes. This mapping table is transmitted to the decoder for the inverse lookup from an index to a valid depth value. The advantage of using this lookup table is the reduced bit depth of the residual index due to sparse depth value occurrences in typical depth maps.
At encoder side SDC process utilizes the mean of the original depth value (dorig) and the predicting depth value (dpred). As illustrated in the example of FIG. 5A, for SDC (i.e., HEVC Intra prediction modes), dpred is calculated as the average of the left-top, right-top, left-bottom, and right-bottom samples (indicated by circles) in a predicted block. FIG. 5B illustrates an example of DMM Mode 1, where the upper left portion belongs to one segment and the lower-right portion (as indicated by slant lines) belongs to another segment. For SDC (DMM Mode 1), dpred of each segment is derived by the left-top, right-top, left-bottom, and right-bottom samples by (indicated circles) which belong to the same segment in a predicted block. While for SDC (DMM Mode 4), dpred of each segment is set equal to any sample which belongs to the same segment in a predicted block. This is because all samples within one segment share one same prediction value.