Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. The multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. 3D video formats may also include depth maps associated with corresponding texture pictures. The depth maps also have to be coded to rendering three-dimensional view or multi-view.
Various techniques to improve the coding efficiency of 3D video coding have been disclosed in the field. There are also development activities to standardize the coding techniques. For example, a working group, ISO/IEC JTC1/SC29/WG11 within ISO (International Organization for Standardization) is developing an HEVC (High Efficiency Video Coding) based 3D video coding standard (named 3D-HEVC). In 3D-HEVC, a technique named motion parameter inheritance (MPI) has been developed to allow depth maps to inherit motion information from texture pictures. The basic idea behind the MPI mode is that the motion characteristics of the video signal and its associated depth map should be similar, since both correspond to projections of the same scenery from the same viewpoint at the same time instant. In order to enable efficient encoding of the depth map data, the MPI mode is used to allow the depth map data to inherit the coding unit (CU) and prediction unit (PU) partitions and corresponding motion parameters from the corresponding video signal. The motion vectors of the video signal according to HEVC use quarter-sample accuracy. On the other hand, the motion vectors of the depth maps use full-sample accuracy. Therefore, in the inheritance process, the motion vectors of the video signal are quantized to nearest full-sample positions. The decision regarding whether to inherit motion information from the video signal or to use own motion information can be made adaptively for each block of the depth map.
FIG. 1 illustrates an example of MPI for depth coding in 3D-HEVC. Texture picture 112 corresponds to a current picture and texture picture 110 corresponds to a picture at a reference time instance. Both texture pictures 110 and 112 are in the same view. Block 150 (e.g., a CU) in current picture 112 is partitioned into four sub-blocks. Motion vectors 132 and 134 are associated with sub-blocks 152 and 154. Depth block 160 is co-located with texture block 150 and may inherit motion information from texture block 150. Accordingly, sub-blocks 162 and 164 may inherit motion information (e.g., motion vectors 132′ and 134′) from respective sub-blocks 152 and 154. Block 170 in current picture 112 is partitioned into four sub-blocks. Motion vector 136 is associated with sub-block 172. Depth block 180 is co-located with texture block 170. Depth sub-block 182 does not inherit motion information from co-located texture sub-block. In this case, an own motion vector 146 is transmitted for the corresponding depth sub-block 182. For signaling the MPI coding mode, the Merge/Skip mode syntax is used. The list of possible Merge candidates has been extended for depth map coding so that the first Merge candidate refers to MPI coding mode, i.e., inheriting motion information and CU/PU structure of the corresponding block of the associated video signal in this case.
The MPI mode can be used in any level of the hierarchical coding-tree block of the depth map. If the MPI mode is indicated at a higher level of the depth map coding tree, the depth map data in this higher level unit can inherit the CU/PU subdivision as well as the corresponding motion data from the video signal. This higher level unit may be larger than the CU size for the video signal. Accordingly, it possible to specify MPI mode for a whole tree-block, typically corresponding to 64×64 image samples, and the whole tree-block of the depth map is partitioned into CUs and PUs by inheriting the CU and PU partitioning of the corresponding region of the video signal. If the MPI mode is indicated at a level of the coding tree that is smaller than or the same size as the corresponding CU size of the video signal, only the motion data are inherited from the video signal. When the MPI mode is used, not only the partitioning and the motion vectors, but also the reference picture indices are inherited from the video signal. Therefore, it has to be ensured that the depth maps corresponding to the video reference pictures are also available in the reference picture buffer for the depth map signal. The MPI mode is only possible, if the whole region of the video signal is coded using Inter prediction.
The syntax design for the existing 3D-HEVC still allows the encoder to signal the MPI mode as ON (i.e., to enable the MPI mode) even though partial corresponding region of the video signal is coded using Intra prediction or the region has no valid reference data, i.e., no reference picture inherited from the corresponding region is available in the reference picture list of the current slice. In this case, inconsistency of MPI may happen due to different implementation between an encoder and a decoder. As a result, mismatch may occur in decoded pictures. There may be risks that a decoder exhibits unexpected behavior by using undefined motion information.
There is also similar development effort of 3D video coding based on the Advanced Video Coding (AVC) standard, which is often referred as H.264. The AVC-based 3D video coding is referred as 3D-AVC. In, 3D-AVC, the system also faces the redundancy issue between the texture view component and the depth view component. Since the texture pictures and its associated depth maps correspond to similar object silhouette, both will experience similar object movement. Accordingly, there is significant redundancy in the motion fields between the texture view component and the depth view component. A new coding mode is used in the existing 3D-AVC to allow the associated depth view component to use the motion information from a texture view component. The new mode is called Inside View Motion Prediction (IVMP) mode, which is enabled only for Inter coded Macroblocks (MBs) of the depth view component. The size of an MB is 16×16 for the AVC standard. In the IVMP mode, the motion information, including mb_type, sub_mb_type, reference indices and motion vectors of the co-located MB in the texture view component is reused by the depth view component of the same view. A flag is signaled in each MB to indicate whether the MB uses the IVMP mode.
FIG. 2 illustrates an example of the IVMP for texture pictures and depth maps in view i of 3D-AVC. If the IVMP flag associated with MB 212 of depth map 210 indicates that MB 212 uses the IVMP mode, MB 212 will reuse motion information of a co-located MB (222) of a corresponding texture picture (220). If texture MB 222 has a motion vector (224) pointing to texture MB 242 in texture picture 240, depth MB 212 may use motion vector 214 inherited from motion vector 224 to refer to a reference MB (232) in a reference depth map (230). In the existing implementation, the IVMP mode applies only to non-anchor pictures as specified in the H.264 standard.
3D video formats also support the mixed resolution, i.e., the width and height of the depth map may be different from the width and height of the texture picture. The mixed resolution is also called asymmetric coding. In asymmetric coding, the existing IVMP is enabled when all of the corresponding four texture macroblocks are coded as Skip or Inter 16×16 or Inter 16×8 or Inter 8×16. The four texture macroblocks (i.e., macroblocks A, B, C and D) correspond to one depth macroblock (310) as shown in FIG. 3. In IVMP, motion information of each texture macroblock (i.e., 16×16) is mapped to an 8×8 sub-block of one depth macroblock. For example, texture macroblock A is mapped to depth sub-block A′ as shown in FIG. 3.
For depth maps, the motion prediction is performed based on 8×8 block unit. In IVMP, the sub-block type and motion information associated with the 8×8 block unit depend on corresponding texture macroblock. For example, the sub-block type of sub-block A′ will be 8×4 if mb_type of macroblock A is P16×8 (i.e., predictive-coded 16×8 block). The H.264/AVC standard also provides syntax to signal the reference index for each 8×8 block and motion vector for each 4×4 block. When IVMP is used, the representative motion information has to be determined when the corresponding texture macroblock has multiple reference indexes.
In the conventional 3D-AVC, inside-view motion prediction (IVMP) is enabled for depth view coding. When IVMP is enabled, the corresponding motion information from the texture view may be inherited. For symmetric coding between texture and depth views, i.e., texture pictures and depth maps having the same resolution, the motion information of co-located macroblock in the texture view is inherited. For mixed resolution cases, i.e., texture pictures and depth maps having different resolutions (typically depth signal having a lower resolution), one macroblock in the depth view may correspond to multiple texture macroblocks. For example, in the case that both width and height of depth views are reduced by half compared to those of texture views, one macroblock in the depth view corresponds to four macroblocks in the associated texture view. To signal the IVMP mode, one flag may be signaled in the macroblock level. However, if one of the following conditions is true, the IVMP flag is not transmitted and inferred to be 0.                For mixed resolution cases, the co-located macroblock in the texture view is Intra coded, or View Synthesis Prediction (VSP) is used for any partition in the current macroblock.        For mixed resolution cases that the width and height of depth maps are half of those of texture pictures, if any of the four co-located macroblocks in the texture view is Intra coded, or has mb_type equal P_8×8, P_8×8ref0 or B_8×8, or includes VSP.        
View Synthesis Prediction (VSP) is a technique used in 3D video coding to provide prediction using reference pictures from previously coded views (e.g., a base view). In VSP, the reference texture picture and depth map in a reference view are used to generate the prediction for texture picture or depth map in a target view. When a texture macroblock is coded with macroblock type (i.e., mb_type) equal to P_8×8, P_8×8ref0 or B_8×8, the texture macroblock is divided into four 8×8 sub-macroblocks and each sub-macroblock is coded using P_8×8, P_8×8ref0 or B_8×8 mode respectively.
With the conditional signalling of the IVMP mode, coding bits related to unnecessary IVMP signalling could be saved. However, the conditional signalling increases complexity and may also experience parsing dependency issue, where the parsing process of the IVMP flag relies on the coding mode and reference picture type of the co-located macroblock(s). It is desirable to overcome the complexity and dependency issue associated with IVMP mode of 3D-AVC.