Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. Accordingly, the multiple cameras will capture multiple video sequences corresponding to multiple views. In order to provide more views, more cameras have been used to generate multi-view video with a large number of video sequences associated with the views. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space or the transmission bandwidth.
In HEVC (High Efficiency Video Coding) based three-dimensional coding standard (3D-HEVC) or multi-view coding, the independent base-view for the texture is coded using the base coder, which corresponds to a regular video coder such as the standard HEVC coder for video sequence. On the other hand, the depth map and the dependent-view texture are coded using the 3D enhancement coder, such as the 3D-HEVC coder, where utilizes the coded independent base-view for the texture.
In 3D video coding, prediction associated with Inter prediction (i.e., temporal prediction) and inter-view prediction may require signaling of related motion information, such as the motion vector (MV), reference picture index and reference picture list for Inter coding and disparity vector (DV) for inter-view prediction. In order to signal the motion information efficiently, a coding mode, named Merge mode, has been used. In the Merge mode, a Merge candidate list is generated. If a Merge candidate is selected from the list, the motion information for the current block is encoded or decoder to have the same motion information as the selected Merge candidate. The Merge index for the selected Merge candidate is signaled at the encoder side and parsed at the decoder side.
In 3D-HEVC as described in JCT3V-I1003 (Chen, et al., “Test Model 9 of 3D-HEVC and MV-HEVC”, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11 9th Meeting: Sapporo, J P, 3-9 Jul. 2014, Document: JCT3V-I1003), the Merge candidates for texture and depth in the dependent view are shown in Table 1. The Merge candidates indicated in the Italic font are the extra merge candidates (also named extra 3D candidates) in the 3D-HEVC candidate list in addition to the merge candidates used for base-view texture data.
TABLE 1TextureDepthInterview motion predictionTexture Merge candidate(IVMP) candidate(MPI)A1DDDB1IVMP candidateB0A1DVB1VSPB0A0A0B2B2Shift IVMPTemporalShift DVTemporal
In Table 1, the Merge candidates for base-view texture video data include five spatial Merge candidates, A0, A1, B0, B1 and B2 and one temporal Merge candidate. The five spatial Merge candidates are derived from spatial neighboring blocks, where the block locations are shown in FIG. 1A. The temporal Merge candidate T0 is shown in FIG. 1B. The spatial Merge candidates and the temporal Merge candidate are used in the conventional HEVC (also named as non-3D-HEVC) as well as in the 3D-HEVC.
In Table 1, MPI represents the Merge candidate for motion parameter inheritance (MPI). In this case, the depth block inherits the motion characteristics of its corresponding texture block. DDD represents the Merge candidate derived according to disparity derived depth (DDD) coding, which is applied to Inter-coded PUs. Thus, a depth value can be derived from its corresponding disparity vector. On the other hand, DV corresponds to disparity vector (DV) based Merge candidate and IVMP represents the Merge candidate derived from Inter-view motion prediction. Furthermore, a shifted IVMP and DV may also be included in the Merge list. The candidates are inserted into the Merge list according to a priority order. When the size of the Merge list reached the maximum number, the Merge list is full and no more candidates are inserted. The candidates listed in Table 1 are according to the priority order from top to bottom. For example, IVMP candidate will be added to the texture Merge list first if the IVMP candidate exists. Candidate A1 will be inserted following the IVMP candidate if candidate A1 exists. Similarly, candidate B1 will be inserted following candidate A1 if candidate B1 exists, and so on.
In order to reduce the complexity, the 3D-HEVC Merge candidates (i.e., the extra candidates in Table 1) except for VSP (view synthesis prediction) inheritance are removed for PUs (prediction units) with block sizes of 8×4 and 4×8. The Merge candidate sets for HEVC, 3D-HEVC Texture/Depth with PU larger or equal to 8×8, and the 3D-HEVC Texture with 8×4/4×8 PU in the draft 3D-HEVC standard version 11.0 are illustrated in the following Table 2. As shown in Table 2, bi-predictive (combined-Bi) candidate and zero-valued vector (Zero) candidate are used as well. The VSP inheritance candidate corresponds to a merged candidate that is VSP coded.
TABLE 2HEVC/3D-HEVC3D-HEVC3D-HEVC3D-HEVCTextureDepthTextureDepth(8 × 4/4 × 8(8 × 4/(>=8 × 8 PU)(>=8 × 8 PU)PU)4 × 8 PU)MergeMPIcand.DDDIVMPIVMPA1A1A1A1B1B1B1B1B0B0B0B0DVVSP(VSPinheritance)A0A0A0A0B2B2B2B2Shift IVMPShift DVTemporalTemporalTemporalTemporalcombined-Bicombined-Bicombined-Bicombined-BiZeroZeroZeroZero#merge131188cand.
Therefore, in current 3D-HEVC, there are four different kinds of Merge candidate sets:                1. Independent-view texture and depth map with PU size equal to 8×4 or 4×8: HEVC candidate sets for both texture and depth data.        2. Independent-view texture and depth map with PU size equal to or larger than 8×8: HEVC candidate sets for texture data and 3D-HEVC for depth data.        3. Dependent-view texture and depth map with PU size larger or equal to 8×8: 3D-HEVC candidate sets for both texture and depth data.        4. Dependent-view texture and depth map with PU size equal to 8×4 or 4×8: HEVC candidate sets+VSP inheritance for texture data and HEVC candidate sets for depth data.        
The classification of Merge candidate sets in the draft 3D-HEVC standard version 11.0 are illustrated in Table 3.
TABLE 3TextureDepthBase viewHEVC Merge candidates3D-HEVC MergePU >= 8 × 8candidatesBase viewHEVC Merge candidatesHEVC MergePU = 4 × 8 or 8 × 4candidatesDependent view3D-HEVC Merge candidates3D-HEVC MergePU >= 8 × 8candidatesDependent viewHEVC candidates plus VSPHEVC MergePU = 4 × 8 or 8 × 4inheritance candidatecandidates
In the current 3D-HEVC specification, there are three control flags in Video Parameter Set extension 2 (VPS_extension2) to control the on/off for those extra candidates as summarized as follows:                iv_mv_pred_flag: controls the on/off of the IVMP, DV, shift IVMP, and shift DV candidates.        mpi_flag: controls the on/off of the MPI candidate and DDD candidate.        view_synthesis_pred_flag: controls the on/off of the VSP candidate.        
In existing 3D-HEVC practice, the 3D Merge candidate enabling flags are signaled in VPS (video parameter set). The syntax in VPS_extension2 according to the existing 3D-HEVC is shown in Table 4. For texture and depth data associated with the VPS will enable respective 3D Merge candidates according to the 3D Merge candidate enabling flags. These 3D Merge candidates enabled can be inserted into the merge list according to a priority order.
TABLE 4vps_extension2( ) {Note while( !byte_aligned( ) )  vps_extension_byte_alignment_reserved_one_bit for( i = 1; i <= vps_max_layers_minus1; i++ ) {  layerId = layer_id_in_nuh[ i ]  iv_mv_pred_flag[ layerId ](4-1)  iv_mv_scaling_flag[ layerId ]  if ( !VpsDepthFlag[ layerId ] ) {   log2_sub_pb_size_minus3[ layerId ]   iv_res_pred_flag[ layerId ]   depth_refinement_flag[ layerId ]   view_synthesis_pred_flag[ layerId ](4-2)   depth_based_blk_part_flag[ layerId ]  } else {   mpi_flag[ layerId ](4-3)   log2_mpi_sub_pb_size_minus3[ layerId ]   dmm_cpredtex_flag[ layerId ]   intra_sdc_dmm_wfull_flag[ layerId ]   lim_qt_pred_flag[ layerId ]   inter_sdc_flag[ layerId ]  } } cp_precision ....}
In Table 4, iv_mv_pred_flag[layerId] (as indicated by note (4-1)) indicates whether inter-view motion parameter prediction is used in the decoding process of the layer with nuh_layer_id equal to layerId. iv_mv_pred_flag[layerId] equal to 0 specifies that inter-view motion parameter prediction is not used for the layer with nuh_layer_id equal to layerId. iv_mv_pred_flag[layerId] equal to 1 specifies that inter-view motion parameter prediction may be used for the layer with nuh_layer_id equal to layerId. When not present, the value of iv_mv_pred_flag[layerId] is inferred to be equal to 0. When NumDirectRefLayers[layerId] is equal to 0, the value of iv_mv_pred_flag[layerId] shall be equal to 0.
In Table 4, view_synthesis_pred_flag[layerId] (as indicated by note (4-2)) equal to 0 specifies that view synthesis prediction Merge candidates are not used for the layer with nuh_layer_id equal to layerId. view_synthesis_pred_flag[layerId] equal to 1 specifies that view synthesis prediction Merge candidates might be used for the layer with nuh_layer_id equal to layerId. When not present, the value of view_synthesis_pred_flag[layerId] is inferred to be equal to 0. When NumDirectRefLayers[layerId] is equal to 0, the value of view_synthesis_pred_flag[layerId] shall be equal to 0.
In Table 4, mpi_flag[layerId] (as indicated by note (4-3)) equal to 0 specifies that motion parameter inheritance is not used for the layer with nuh_layer_id equal to layerId. mpi_flag[layerId] equal to 1 specifies that motion parameter inheritance may be used for the layer with nuh_layer_id equal to layerId. When not present, the value of mpi_flag[layerId] is inferred to be equal to 0.
The size of the Merge candidate list is signaled in the bitstream using syntax element, five_minus_max_num_merge_cand, which specifies the maximum number of merging MVP candidates supported in the slice subtracted from 5. The variable NumExtraMergeCand representing the number of extra Merge candidates is derived as follows:NumExtraMergeCand=iv_mv_pred_flag[nuh_layer_id]∥mpi_flag[nuh_layer_id].
As is well known in the field, the “∥” symbol represents logic “OR” operation. In other words, if any of iv_mv_pred_flag[nuh_layer_id] and mpi_flag[nuh_layer_id] has a value of 1, NumExtraMergeCand is equal to 1. The maximum number of merging MVP candidates, MaxNumMergeCand is derived as follows:MaxNumMergeCand=5−five_minus_max_num_merge_cand+NumExtraMergeCand,where the value of MaxNumMergeCand is in the range of 1 to (5+NumExtraMergeCand), inclusive.
According to the existing 3D-HEVC specification, the size of the candidate list is increased by 1 in order to allow more Merge candidates included in the Merge list. For example, while the size of candidate list for the base view is 5, the size of candidate list in dependent texture views and depth map is 6. Whether to increase the candidate list by 1 depends only on the 3D Merge candidate enabling flags, iv_mv_pred_flag and mpi_flag.