Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the 3D video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
To reduce the inter-view redundancy, disparity-compensated prediction (DCP) has been used as an alternative to motion-compensated prediction (MCP). MCP refers to an inter-picture prediction that uses already coded pictures of the same view in a different access unit, while DCP refers to inter-picture prediction that uses already coded pictures of other views in the same access unit, as illustrated in FIG. 1. The three-dimensional/multi-view data consists of texture pictures (110) and depth maps (120). The motion compensated prediction is applied to texture pictures or depth maps in the temporal direction (i.e., the horizontal direction in FIG. 1). The disparity compensated prediction is applied to texture pictures or depth maps in the view direction (i.e., the vertical direction in FIG. 1). The vector used for DCP is termed disparity vector (DV), which is analog to the motion vector (MV) used in MCP.
3D-HEVC is an extension of HEVC (High Efficiency Video Coding) that is being developed for encoding/decoding 3D video. One of the views is referred to as the base view or the independent view. The base view is coded independently of the other views as well as the depth data. Furthermore, the base view is coded using a conventional HEVC video coder.
In 3D-HEVC, a hybrid block-based motion-compensated DCT-like transform coding architecture is still utilized. The basic unit for compression, termed coding unit (CU), is a 2N×2N square block, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs). The PU size can be 2N×2N, 2N×N, N×2N, or N×N. When asymmetric motion partition (AMP) is supported, the PU size can also be 2N×nU, 2N×nD, nL×2N and nR×2N. The coding unit (CU) syntax related to the PCM (pulse code modulation) mode and SDC (segment-wise depth coding) mode is shown in Table 1.
TABLE 1coding_unit( x0, y0, log2CbSize , ctDepth) {Descriptor  ....  if( depth_based_blk_part_flag[ nuh_layer_id ] &&   PartMode == PART_2N×N )   dbbp_flag[ x0 ][ y0 ]u(1)  if( sdcEnableFlag )   sdc_flag[ x0 ][ y0 ]ae(v)  if( CuPredMode[ x0 ][ y0 ] == MODE_INTRA ) {   if( PartMode == PART_2N×2N &&   pcm_enabled_flag &&    log2CbSize >= Log2MinIpcmCbSizeY &&    log2CbSize <= Log2MaxIpcmCbSizeY )    pcm_flag[ x0 ][ y0 ]ae(v)    ....
As shown in Table 1, the SDC flag, sdc_flag[x0][y0] is incorporated if the SDC enable flag (i.e., sdcEnableFlag) is asserted. Furthermore, the PCM flag, pcm_flag[x0][y0] is incorporated if the CU is coded in an Intra mode (i.e., CuPredMode[x0][y0]==MODE_INTRA), PCM enable flag (i.e., pcm_enabled_flag) is asserted, and some block size conditions are satisfied. The mode enable flag is an indication regarding whether the corresponding mode is allowed. If the mode enable flag is asserted, it means that the corresponding mode is allowable. In this case, a further mode flag is signaled to indicate whether this mode is applied to an underlying image processing unit such as a coding unit (CU). For example, the SDC flag (i.e., sdc_flag[x0][y0]) is signaled when the SDC enable flag (i.e., sdcEnableFlag) is asserted. Otherwise, the SDC flag is not signaled. Furthermore, an underlying image processing unit is coded using SDC mode when the SDC flag (i.e., sdc_flag[x0][y0]) is asserted. If the SDC flag is not asserted, the SDC coding is not applied to the underlying image processing unit.
The coding unit extension syntax is a place to include some newly added features to the coding standard. The coding unit extension syntax design according to the conventional approach is shown in Table 2.
TABLE 2cu_extension( x0 , y0 , log2CbSize ) {Descriptor if ( icEnableFlag )  ic_flagae(v) if ( rpEnableFlag )  iv_res_pred_weight_idxae(v)  ....}
As shown in Table 2, the existing coding unit extension syntax incorporates illumination compensation flag (i.e., ic_flag) when illumination compensation enable flag (i.e., icEnableFlag) is asserted. In other words, if icEnableFlag is 1 specifies that ic_flag is present in the coding unit. If icEnableFlag is 0 specifies that ic_flag is not present in the coding unit. The weight index for inter-view residual prediction (i.e., iv_res_pred_weight_idx) is incorporated when the residual prediction enable flag (i.e., rpEnableFlag) is asserted. None of the SDC flag, the PCM flag and DBBP flag is incorporated in the coding unit extension syntax.
The 3D video is typically created by capturing a scene using video camera with an associated device to capture depth information or using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. The texture data and the depth data corresponding to a scene usually exhibit substantial correlation. Therefore, the depth information can be used to improve coding efficiency or reduce processing complexity for texture data, and vice versa. For example, the corresponding depth block of a texture block reveals similar information corresponding to the pixel level object segmentation. Therefore, the depth information can help to realize pixel-level segment-based motion compensation. Accordingly, a depth-based block partitioning (DBBP) has been adopted for texture video coding in the current 3D-HEVC (3D video coding based on the High Efficiency Video Coding (HEVC) standard).
In the depth-based block partitioning (DBBP) mode, arbitrarily shaped block partitioning for the collocated texture block is derived based on a binary segmentation mask computed from the corresponding depth map. Each of the two partitions (resembling foreground and background) is motion compensated and merged afterwards based on the depth-based segmentation mask.
A single flag is added to the coding syntax to signal to the decoder that the underlying block uses DBBP for prediction. When current coding unit is coded with the DBBP mode, the corresponding partition size is set to SIZE_2N×2N and bi-prediction is inherited.
A disparity vector derived from the DoNBDV (Depth-oriented Neighboring Block Disparity Vector) process is applied to identify a corresponding depth block in a reference view as shown in FIG. 2. In FIG. 2, corresponding depth block 220 in a reference view for current texture block 210 in a dependent view is located based on the location of the current texture block and derived DV 212, which is derived using DoNBDV according to 3D-HEVC standard. The corresponding depth block has the same size as current texture block. When the depth block is found, a threshold is calculated based on the average of all depth pixels within the corresponding depth block. Afterwards, a binary segmentation mask m_D (x,y) is generated based on depth values and the threshold. When the depth value located at the relative coordinator (x, y) is larger than the threshold, the binary mask m_D (x,y) is set to 1. Otherwise, m_D (x,y) is set to 0. An example is shown in FIG. 3. The mean value of the virtual block (310) is determined in step 320. The values of virtual depth samples are compared to the mean depth value in step 330 to generate segmentation mask 340. The segmentation mask is represented in binary data to indicate whether an underlying pixel belongs to segment 1 or segment 2, as indicated by two different line patterns in FIG. 3
The DoNBDV process enhances the NBDV by extracting a more accurate disparity vector from the depth map. The NBDV is derived based on disparity vector from neighboring blocks. The disparity vector derived from the NBDV process is used to access depth data in a reference view. A final disparity vector is then derived from the depth data.
The DBBP process partitions the 2N×2N block into two partitioned block. A motion vector is determined for each partition block. In the decoding process, each of the two decoded motion parameters is used for motion compensation performed on a whole 2N×2N block. The resulting prediction signals, i.e., p_T0 (x,y) and p_T1 (x,y) are combined using the DBBP mask m_D (x,y), as depicted in FIG. 4. The combination process is defined as follows
                              p_T          ⁢                      (                          x              ,              y                        )                          =                  {                                                                                                                p_T                      ⁢                                                                                          ⁢                      0                      ⁢                                              (                                                  x                          ,                          y                                                )                                                              ,                                                                                                              if                      ⁢                                                                                          ⁢                      m_D                      ⁢                                              (                                                  x                          ,                          y                                                )                                                              =                    1                                                                                                                                          p_T                      ⁢                                                                                          ⁢                      1                      ⁢                                              (                                                  x                          ,                          y                                                )                                                              ,                                                                    otherwise                                                      .                                              (        1        )            
Whether the DBBP mode is used is signaled on coding unit as shown in Table 1. In currently design, the DBBP flag (i.e., dbbp_flag[x0][y0]) is conditionally signaled depended on a transmitted partition mode (i.e., PartMode). The DBBP flag is signaled only when the transmitted PartMode equals to 2N×N partition (i.e., PartMode==PART_2N×N).
The SDC approach provides an alternative residual coding method. With SDC, the residual data (one or two constant residual values within one PU) is coded without transform and quantization processes. Whether SDC is used is signaled in the coding unit parameters structure at PU level. The partition size of CU containing a SDC coded PU is always 2N×2N. SDC can be applied to depth data coded using all Intra prediction modes including HEVC Intra prediction modes and depth modelling modes (DMMs). For HEVC Intra prediction modes, the entire PU is considered as one segment. For DMM modes, the PU is divided into two segments.
In FIG. 4, the two prediction blocks are merged into one on a pixel by pixel basis according to the segmentation mask and this process is referred as bi-segment compensation. In this example, the N×2N block partition type is selected and two corresponding motion vectors (MV1 and MV2) are derived two partitioned blocks respectively. Each of the motion vectors is used to compensate a whole texture block (410). Accordingly, motion vector MV1 is applied to texture block 420 to generate prediction block 430 according to motion vector MV1, and motion vector MV2 is applied to texture block 420 also to generate prediction block 432 according to motion vector MV2. The two prediction blocks are merged by applying respective segmentation masks (440 and 442) to generate the final prediction block (450).
Before coding, the residual values are mapped to values, which are present in the original, uncompressed depth map by using a Depth Lookup Table (DLT). Consequently, residual values can be coded by signaling only the index into this lookup table, which reduces the bit depth of residual magnitudes. This mapping table is transmitted to the decoder for the inverse lookup from an index to a valid depth value. The advantage of using this lookup table is the reduced bit depth of the residual index due to sparse depth value occurrences in typical depth maps.
At encoder side SDC process utilizes the mean of the original depth value (dorig) and the predicting depth value (dpred). As illustrated in the example of FIG. 5A, for SDC (i.e., HEVC Intra prediction modes), dpred is calculated as the average of the left-top, right-top, left-bottom, and right-bottom samples (indicated by circles) in a predicted block. FIG. 5B illustrates an example of DMM Mode 1, where the upper left portion belongs to one segment and the lower-right portion (as indicated by slant lines) belongs to another segment. For SDC (DMM Mode 1), dpred of each segment is derived by the left-top, right-top, left-bottom, and right-bottom samples by (indicated circles) which belong to the same segment in a predicted block. While for SDC (DMM Mode 4), dpred of each segment is set equal to any sample which belongs to the same segment in a predicted block. This is because all samples within one segment share one same prediction value.