Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. The multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. 3D video formats may also include depth maps associated with corresponding texture pictures. The depth maps also have to be coded to rendering three-dimensional view or multi-view.
Various techniques to improve the coding efficiency of 3D video coding have been disclosed in the field. There are also development activities to standardize the coding techniques. For example, a working group, ISO/IEC JTC1/SC29/WG11 within ISO (International Organization for Standardization) is developing an HEVC (High Efficiency Video Coding) based 3D video coding standard (named 3D-HEVC). To reduce the inter-view redundancy, a technique, called disparity-compensated prediction (DCP) has been added as an alternative coding tool to motion-compensated prediction (MCP). MCP is also referred as Inter picture prediction that uses previously coded pictures of the same view in a different access unit (AU), while DCP refers to an Inter picture prediction that uses already coded pictures of other views in the same access unit, as shown in FIG. 1. The vector used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP. The video pictures (110A) and depth maps (110B) corresponding to a particular camera position are indicated by a view identifier (viewID). For example, video pictures and depth maps associated with three views (i.e., V0, V1 and V2) are shown in FIG. 1. All video pictures and depth maps that belong to the same camera position are associated with the same viewID. The video pictures and, when present, the depth maps are coded access unit by access unit, as shown in FIG. 1. An AU (120) includes all video pictures and depth maps corresponding to the same time instant. The motion data compression is performed for each picture after all the pictures (both texture and depth) within the same AU are coded. In this case, for each AU, the reconstruction process for pictures within the AU can rely on full-resolution motion data associated with the current AU. The motion data compression will only affect the reconstruction process of other Ails that refer the compressed motion data associated with the current AU.
3D-HEVC is an extension of HEVC that is being developed for encoding/decoding 3D video. One of the views, which is also referred to as the base view or the independent view, is coded independently of the other views and the depth data; the texture picture in the based view is coded using a conventional HEVC video coder.
In 3D-HEVC version 4.0, Inter mode, Merge and Skip mode are used for depth coding. For depth coding in 3D-HEVC, a hybrid block-based motion-compensated DCT-like transform coding architecture similar to that for the texture component is utilized. The basic unit for compression, termed as coding unit (CU), is a 2N×2N square block. Each CU can be recursively split into four smaller CUs until a pre-defined minimum size is reached. Each CU contains one or multiple prediction units (PUs). The PU size can be 2N×2N, 2N×N, N×2N, or N×N. When asymmetric motion partition (AMP) is supported, the PU size can also be 2N×nU, 2N×nD, nL×2N or nR×2N.
For depth coding in 3D-HEVC, a motion vector competition (MVC) based scheme is also applied to select one motion vector predictor/disparity vector predictor (MVP/DVP) among a given candidate set of MVPs/DVPs. The candidate set of MVPs/DVPs includes spatial and temporal MVPs/DVPs. There are three inter-prediction modes including Inter, Skip, and Merge in HTM-4.0. The Inter mode performs motion-compensated prediction/disparity-compensated prediction with transmitted motion vectors/disparity vectors (MVs/DVs), while the Skip and Merge modes utilize motion inference methods to obtain the motion information from spatial blocks located in the current picture or a temporal block located in a temporal collocated picture which is signaled in the slice header. When a PU is coded in either Skip or Merge mode, no motion information is transmitted except an index of the selected candidate. In the case of a Skip PU, the residual signal is also omitted. For the Inter mode in HTM-4.0, the advanced motion vector prediction (AMVP) scheme is used to select a motion vector predictor among an AMVP candidate set including two spatial MVPs/DVPs and one temporal MVP/DVP. As for the Merge and Skip modes, the Merge scheme is used to select a motion vector predictor among a Merge candidate set containing four spatial merging candidates and one temporal merging candidate. Based on the rate-distortion optimization (RDO) decision, the encoder selects one final MVP/DVP within a given candidate set of MVPs/DVPs for Inter, Skip, or Merge modes and transmits the index of the selected MVP/DVP to the decoder. The selected MVP/DVP may be linearly scaled according to temporal distances or view distances.
For Inter mode of depth coding, the reference picture index is explicitly transmitted to the decoder. The MVP/DVP is then selected among the candidate set for a given reference picture index. As shown in FIG. 2, the MVP/DVP candidate set for the Inter mode in HTM-4.0 includes two spatial MVPs/DVPs and one temporal MVP/DVP. The size of MVP/DVP candidate set is fixed to 2:
1. Left spatial predictor (the first available MV/DV from A0, A1),
2. Top spatial predictor (the first available MV/DV from B0, B1, B2),
3. Temporal predictor (the first available MV/DV from TBR and TCT), and
4. Zero predictor (zero MV/DV).
For depth coding in 3D-HEVC, if a particular block is encoded as Merge mode, a Merge index is signaled to indicate which MVP/DVP among the Merge candidate set is used for this block to be merged. To follow the essence of motion information sharing or re-use, each merged PU re-uses the MV, prediction direction, and reference picture index of the selected candidate. As shown in FIG. 2, the Merge candidate set includes four spatial merging candidates and one temporal merging candidate. The size of Merge candidate set (also called candidate list) is fixed to 5:
1. Left spatial predictor (A1)
2. Top spatial predictor (B1)
3. Above right spatial predictor (B0)
4. Below left spatial predictor (A0)
5. Top left predictor (B2)
6. Temporal predictor (the first available M from TBR and TCT)
7. Additional predictor (Bi-predictive candidate, zero candidate)
In order to enable efficient encoding of the depth map data, in HTM-4.0, a new coding mode, named motion parameter inheritance (MPI), that allows inheritance of the treeblock subdivision into CUs and PUs and their corresponding motion parameters from the texture data has been introduced. Since the motion vectors of the video signal have quarter-sample accuracy, whereas for the depth map signal only full-sample accuracy is used, in the inheritance process the motion vectors are quantized to their nearest full-sample position. It can be adaptively decided for each block of the depth map, whether the motion data is inherited from the collocated block of the video signal or if new motion data are transmitted. For signaling the MPI coding mode, the Merge/Skip mode syntax is used. The list of possible Merge candidates has been extended in a way that, for depth map coding, the first Merge candidate refers to merging with the corresponding block from the associated video signal. Since the motion data along with the CU splitting structure of the texture is re-used by the depth, the MPI scheme introduces additional buffer to store the inter-dir (used to indicate the prediction direction) and split flag (used to indicate the CU splitting) information.
FIG. 3 illustrates an example of MPI for depth coding in 3D-HEVC. Texture picture 312 corresponds to a current picture and texture picture 310 corresponds to a picture at a reference time instance. Both texture pictures 310 and 312 are in the same view. Block 350 (e.g., a PU) in current picture 312 is partitioned into four sub-blocks. Motion vectors 332 and 334 are associated with sub-blocks 352 and 354. Depth block 360 is collocated with texture block 350 and may inherit motion parameters from texture block 350. Accordingly, sub-blocks 362 and 364 may inherit motion parameters (e.g., motion vectors 332′ and 334′) from respective sub-blocks 352 and 354. Block 370 in current picture 312 is partitioned into four sub-blocks. Motion vector 336 is associated with sub-block 372. Depth block 380 is collocated with texture block 370. Depth sub-block 382 does not inherit motion information from texture collocated sub-block. In this case, an own motion vector 346 is transmitted for the corresponding depth sub-block 382. For signaling the MPI coding mode, the Merge/Skip mode syntax is used. The list of possible Merge candidates has been extended for depth map coding so that the first Merge candidate refers to merging with the corresponding block of the associated video signal, i.e., inheriting motion parameters of the corresponding block of the associated video signal in this case.
The MPI mode can be used in any level of the hierarchical coding-tree block of the depth map. If the MPI mode is indicated at a higher level of the depth map coding tree, the depth data in this higher level unit can inherit the CU/PU subdivision as well as the corresponding motion data from the video signal. This higher level unit may be larger than the CU size for the video signal. Accordingly, it is possible to specify MPI mode for a whole tree-block, typically corresponding to 64×64 image samples, and the whole tree-block of the depth map is partitioned into CUs and PUs by inheriting the CU and PU partitioning of the corresponding block of the video signal. If the MPI mode is indicated at a level of the coding tree that is smaller than or the same size as the corresponding CU size of the video signal, only the motion data are inherited from the video signal. When the MPI mode is used, not only the partitioning and the motion vectors, but also the reference picture indices are inherited from the video signal. Therefore, it has to be ensured that the depth maps corresponding to the video reference pictures are also available in the reference picture buffer for the depth signal. The MPI mode is only possible, if the whole block of the video signal is coded using Inter prediction. Since MPI allows depth blocks to inherit motion vectors, reference index and CU/PU structures, it causes parsing dependency between the depth component and the texture component. Furthermore, there is also a need to store CU/PU structure information for the texture component.