Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. The multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism. 3D video formats may also include depth maps associated with corresponding texture pictures. The depth maps also have to be coded to rendering three-dimensional view or multi-view.
Various techniques to improve the coding efficiency of 3D video coding have been disclosed in the field. There are also development activities to standardize the coding techniques. For example, a working group, ISO/IEC JTC1/SC29/WG11 within ISO (International Organization for Standardization) is developing an HEVC (High Efficiency Video Coding) based 3D video coding standard (named 3D-HEVC). To reduce the inter-view redundancy, a technique, called disparity-compensated prediction (DCP) has been added as an alternative coding tool to motion-compensated prediction (MCP). MCP is also referred as Inter picture prediction that uses previously coded pictures of the same view in a different access unit (AU), while DCP refers to an Inter picture prediction that uses already coded pictures of other views in the same access unit, as shown in FIG. 1. The vector used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP. The video pictures (110A) and depth maps (110B) corresponding to a particular camera position are indicated by a view identifier (viewID). For example, video pictures and depth maps associated with three views (i.e., V0, V1 and V2) are shown in FIG. 1. All video pictures and depth maps that belong to the same camera position are associated with the same viewID. The video pictures and, when present, the depth maps are coded access unit by access unit, as shown in FIG. 1. An AU (120) includes all video pictures and depth maps corresponding to the same time instant. The motion data compression is performed for each picture after all the pictures (both texture and depth) within the same AU are coded. In this case, for each AU, the reconstruction process for pictures within the AU can rely on full-resolution motion data associated with the current AU. The motion data compression will only affect the reconstruction process of other Ails that refer the compressed motion data associated with the current AU.
In 3D-HEVC version 4.0, Inter mode, Merge and Skip mode are used for texture and depth coding. In 3D-HEVC, a hybrid block-based motion-compensated DCT-like transform coding architecture is utilized. The basic unit for compression, termed as coding unit (CU), is a 2N×2N square block. Each CU can be recursively split into four smaller CUs until a pre-defined minimum size is reached. Each CU contains one or multiple prediction units (PUs). The PU size can be 2N×2N, 2N×N, N×2N, or N×N. When asymmetric motion partition (AMP) is supported, the PU size can also be 2N×nU, 2N×nD, nL×2N or nR×2N. To share the previously encoded motion information of reference views, the inter-view motion prediction is employed. For deriving candidate motion parameters for a current block in a dependent view, a DV for current block is firstly derived, and then the prediction block in the already coded picture in the reference view is located by adding the DV to the location of current block. If the prediction block is coded using MCP, the associated motion parameters can be used as candidate motion parameters for the current block in the current view. The derived DV can also be directly used as a candidate DV for DCP.
In order to enable efficient encoding of the depth map data, in HTM-4.0, a new coding mode, named motion parameter inheritance (MPI), that allows inheritance of the treeblock subdivision into CUs and PUs and their corresponding motion parameters from the texture data has been introduced. Since the motion vectors of the video signal have quarter-sample accuracy, whereas for the depth map signal only full-sample accuracy is used, in the inheritance process the motion vectors are quantized to their nearest full-sample position. It can be adaptively decided for each block of the depth map, whether the motion data is inherited from the collocated block of the video signal or if new motion data are transmitted. For signaling the MPI coding mode, the Merge/Skip mode syntax is used. The list of possible Merge candidates has been extended in a way that, for depth map coding, the first Merge candidate refers to enabling the MPI coding mode to inherent motion and CU/PU splitting structure from the associated video signal. Since the motion data along with the CU/PU splitting structure of the texture is re-used by the depth, the MPI scheme introduces additional buffer to store the inter-dir (used to indicate the prediction direction) and split flag (used to indicate the data splitting) information.
FIG. 2 illustrates an example of MPI for depth coding in 3D-HEVC. Texture picture 212 corresponds to a current picture and texture picture 210 corresponds to a picture at a reference time instance. Both texture pictures 210 and 212 are in the same view. Block 250 (e.g., a PUCU as indicated by a thick lined box) in current picture 212 is partitioned into four sub-blocks. Motion vectors 232 and 234 are associated with sub-blocks 252 and 254. Depth block 260 is collocated with texture block 250 and may inherit motion parameters from texture block 250. Accordingly, sub-blocks 262 and 264 may inherit motion parameters (e.g., motion vectors 232′ and 234′) from respective sub-blocks 252 and 254. Block 270 in current picture 212 is partitioned into four sub-blocks. Motion vector 236 is associated with sub-block 272. Depth block 280 is collocated with texture block 270. Depth sub-block 282 does not inherit motion information from texture collocated sub-block. In this case, a separate motion vector 246 is transmitted for the corresponding depth sub-block 282. For signaling the MPI coding mode, the Merge/Skip mode syntax is used. The list of possible Merge candidates has been extended for depth map coding so that the first Merge candidate refers to using of MPI coding mode, i.e., inheriting motion parameters and CU/PU structure of the corresponding block of the associated video signal in this case.
The MPI mode can be used in any level of the hierarchical coding-tree block of the depth map. If the MPI mode is indicated at a higher level of the depth map coding tree, the depth data in this higher level unit can inherit the CU/PU subdivision as well as the corresponding motion data from the video signal. This higher level unit may be larger than the CU size for the video signal. Accordingly, it is possible to specify MPI mode for a whole tree-block, typically corresponding to 64×64 image samples, and the whole tree-block of the depth map is partitioned into CUs and PUs by inheriting the CU and PU partitioning of the corresponding block of the video signal. If the MPI mode is indicated at a level of the coding tree that is smaller than or the same size as the corresponding CU size of the video signal, only the motion data are inherited from the video signal. When the MPI mode is used, not only the partitioning and the motion vectors, but also the reference picture indices are inherited from the video signal. Therefore, it has to be ensured that the depth maps corresponding to the video reference pictures are also available in the reference picture buffer for the depth signal.
When the texture merging candidate is selected to code a depth block, the motion information of the depth block (312) in a depth map (310) re-uses or derives the motion information of a texture collocated block (322) of a texture picture (320) as shown in FIG. 3. The texture collocated block (322) may contain multiple sets of motion information. The motion information of the texture collocated block (322) may be selected from any location (or can be called sub-block) within or neighboring to the texture collocated block (322). For example, the location may correspond to a sub-block located at a lower right location of a center point of the texture collocated block (322). Nevertheless, other sub-blocks such as upper-left, upper-right or lower-left sub-block may also be used. The motion information may include motion vector, reference picture index and reference list. The texture merging candidate can be placed in any position of the merging candidate list. For example, if the texture merging candidate is available, it can be placed in the first position before the candidate corresponding to the left spatial neighboring block.
In 3D-HEVC version 4.0, inter-view motion vector prediction (MVP) is applied for the Merge/Skip mode. In the Merge/Skip mode, the same motion parameters as for a neighboring block are used. If a block is coded in Merge/Skip mode, a candidate list of motion parameters is derived, which includes the motion parameters of spatially neighboring blocks as well as motion parameters that are calculated based on the motion parameters of the co-located block in a temporal reference picture. The chosen motion parameters are signaled by transmitting an index in the candidate list. For the derivation of the inter-view motion compensated and the inter-view motion disparity compensated candidate, a corresponding block in a view component at the same time instant as the current view component is utilized. The corresponding block is determined by shifting the position of the current block using the disparity vector derived.
FIG. 4 illustrates an example of inter-view candidate derivation based on 3D-HEVC version 4.0. Block 420 represents a current block in a view (i.e., V1 in FIG. 4) and the corresponding block (410) in the base view (i.e., V0) can be located according to disparity associated with the current block. The corresponding block (410) has an MV (412) pointing to an inter-view block in list0 refidx0 of V0 in this example. If the collocated reference ColRef (i.e., L0 Ref0 of V0) is also in list0 of current block, the MV (422) for block 420 can re-use motion information from V0 as inter-view candidate of L0 for V1. The same derivation is applied to L1 refidx1 of V0. Similarly, the MV (414) associated with list1 refidx1 of V0 can be re-used for V1 as inter-view candidate MV (424).
Advanced residual prediction (ARP) based on Inter-view residual prediction is another tool offered by 3D-HEVC version 4.0. Relationship among current block, reference block and motion compensated block according to ARP are illustrated in FIG. 5. A disparity vector (516) is determined first for the current block (520) in view Vm to point to corresponding block in a target reference view (i.e., V0). The corresponding block (510) in the picture of the reference view within the same access unit is then located by the disparity vector. The motion information of the current block is re-used to derive the motion information for the reference block. For example, current block 520 has motion vector 522 pointing to list0 refidx0 in V1. The motion information can be reused by the corresponding block (510). Motion compensation can be applied to the corresponding block (510) based on the same motion vector (i.e., 522) of current block and derived reference picture in the reference view for the reference block, to derive residue block 512. The reference picture in the reference view (V0) which has the same POC (Picture Order Count) value as the reference picture in the current view (Vm) is selected as the reference picture of the corresponding block to form a residue block. A weighting factor is then applied to the residue block to obtain a weighted residue block and the weighted residue block is added to the predicted samples. Similar process can be applied for motion vector 524 pointing to a reference picture in list1 of V1 to obtain residue block 514.
FIG. 6 illustrates an exemplary prediction structure of residual prediction. Block 610 represents the current block in the current view (i.e., view 1), block 620 and block 630 denote the representation of current block 610 in the reference view (view 0) at time Tj and temporal prediction of current block 610 from the same view (view 1) at time Ti respectively. Motion vector 650 denotes the motion from current block 610 to block 630 at time Ti from the same view. Since current block 610 in view 1 and corresponding block 620 in view 0 are actually projections of the same object in two different views, these two blocks should share the same motion information. Therefore, temporal prediction block 640 in view 0 at time Ti of corresponding block 620 in view 0 at time Tj can be located from corresponding block 620 in view 0 by applying the motion information of motion vector 650. The residue (i.e., 640) of corresponding block 620 is then multiplied by a weighting factor and is used as along with the corresponding block (i.e., 620) to form the predictor for current block (i.e., 610).
Besides 3D video coding and multi-view video coding, motion information prediction and inheritance is also used in scalable video coding system. The joint video team (JVT) of ISO/IEC MPEG and ITU-T VCEG has standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC provides temporal, spatial, and quality scalabilities in a single bitstream. The SVC scalable bitstream contains the video information from low frame-rate, low resolution, and low quality videos to high frame rate, high definition, and high quality videos. This single bitstream can be adapted to various transmission environments and applications to deliver video at selected spatial/temporal resolution and video quality.
In SVC, three types of scalabilities, i.e., temporal scalability, spatial scalability, and quality scalability are provided. SVC uses the multi-layer coding structure to realize three dimensions of scalability. The concept of SVC is to generate one scalable bitstream that can be easily and rapidly adapted without transcoding or re-encoding to fit the bit-rate of various transmission channels, diverse display capabilities, and different computational resources. An important feature of SVC design is that the scalability is provided at a bitstream level. Bitstreams for a reduced spatial and/or temporal resolution can be simply obtained by discarding NAL units (or network packets) from a scalable bitstream that are not required for decoding the target resolution. NAL units for quality refinement can be additionally truncated in order to reduce the bit-rate and the associated video quality.
To share the previously encoded motion information of reference layers, the inter-layer motion prediction is employed. For example, motion parameter derivation for a current block in an enhancement layer may use the motion parameter of a collocated prediction block in previously coded picture in the reference layer.
In the motion information prediction/inheritance processes mentioned above, it is always assumed that the reference picture index is in the current reference picture list or the picture order count (POC) of the reference picture of the reference block in the picture used to derive motion parameter is in the current reference picture list. However, the reference picture index or the picture order count (POC) of the reference picture of the reference block in the picture used to derive motion parameter may not in the current reference picture list. In this case, the predicted or inherited motion parameters will be invalid. It is desirable to overcome this issue.