Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. Accordingly, the multiple cameras will capture multiple video sequences corresponding to multiple views. In order to provide more views, more cameras have been used to generate multi-view video with a large number of video sequences associated with the views. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space or the transmission bandwidth.
A straightforward approach may be to simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such coding system would be very inefficient. In order to improve efficiency of multi-view video coding, typical multi-view video coding exploits inter-view redundancy. Therefore, most 3D Video Coding (3DVC) systems take into account of the correlation of video data associated with multiple views and depth maps. The standard development body, the Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG), extended H.264/MPEG-4 AVC to multi-view video coding (MVC) for stereo and multi-view videos.
The MVC adopts both temporal and spatial predictions to improve compression efficiency. During the development of MVC, some macroblock-level coding tools are proposed, including illumination compensation, adaptive reference filtering, motion skip mode, and view synthesis prediction. These coding tools are proposed to exploit the redundancy between multiple views. Illumination compensation is intended for compensating the illumination variations between different views. Adaptive reference filtering is intended to reduce the variations due to focus mismatch among the cameras. Motion skip mode allows the motion vectors in the current view to be inferred from the other views. View synthesis prediction is applied to predict a picture of the current view from other views.
In the reference software for HEVC based 3D video coding (3D-HTM), inter-view candidate is added as a motion vector (MV) or disparity vector (DV) candidate for Inter, Merge and Skip mode in order to re-use previously coded motion information of adjacent views. In 3D-HTM, the basic unit for compression, termed as coding unit (CU), is a 2N×2N square block. Each CU can be recursively split into four smaller CUs until a predefined minimum size is reached. Each CU contains one or more prediction units (PUs).
To share the previously coded texture information of adjacent views, a technique known as Disparity-Compensated Prediction (DCP) has been included in 3D-HTM as an alternative coding tool to motion-compensated prediction (MCP). MCP refers to an inter-picture prediction that uses previously coded pictures of the same view, while DCP refers to an inter-picture prediction that uses previously coded pictures of other views in the same access unit. FIG. 1 illustrates an example of 3D video coding system incorporating MCP and DCP. The vector (110) used for DCP is termed as disparity vector (DV), which is analog to the motion vector (MV) used in MCP. FIG. 1 illustrates three MVs (120, 130 and 140) associated with MCP. Moreover, the DV of a DCP block can also be predicted by the disparity vector predictor (DVP) candidate derived from neighboring blocks or the temporal collocated blocks that also use inter-view reference pictures. In 3D-HTM, when deriving an inter-view Merge candidate for Merge/Skip modes, if the motion information of corresponding block is not available or not valid, the inter-view Merge candidate is replaced by a DV.
In Inter mode, Direction-Separate Motion Vector Prediction is another coding tool used in 3D-AVC. The direction-separate motion vector prediction consists of the temporal and inter-view motion vector prediction. If the target reference picture is a temporal prediction picture, the temporal motion vectors of the adjacent blocks around the current block Cb, such as A, B, and C in FIG. 2A are employed in the derivation of the motion vector prediction. If a temporal motion vector is unavailable, an inter-view motion vector is used. The inter-view motion vector is derived from the corresponding block indicated by a DV converted from depth. The motion vector prediction is then derived as the median of the motion vectors of the adjacent blocks A, B, and C. Block D is used only when C is unavailable.
On the contrary, if the target reference picture is an inter-view prediction picture, the inter-view motion vectors of the neighboring blocks are employed for the inter-view prediction. If an inter-view motion vector is unavailable, a disparity vector which is derived from the maximum depth value of four corner depth samples within the associated depth block is used. The motion vector predictor is then derived as the median of the inter-view motion vector of the adjacent blocks A, B, and C.
On the other hand, if the target reference picture is an inter-view prediction picture, the inter-view motion vectors of the neighboring blocks are used to derive the inter-view motion vector predictor as shown in FIG. 2B. Inter-view motion vectors of the spatially neighboring blocks are derived based on the texture data of respective blocks in step 210. The depth map associated with the current block Cb is also provided in step 260. The availability of inter-view motion vector for blocks A, B and C is checked in step 220. If an inter-view motion vector is unavailable, the disparity vector for the current block is used to replace the unavailable inter-view motion vector as shown in step 230. The disparity vector is derived from the maximum depth value of the associated depth block (280) as shown in step 270. The median of the inter-view motion vectors of blocks A, B and C is used as the inter-view motion vector predictor. The conventional MVP procedure is shown in step 240, where a final MVP is derived based on the median of the motion vectors of the inter-view MVPs or temporal MVPs. Motion vector coding based on the motion vector predictor is performed as shown in step 250.
Priority based MVP candidate derivation for Skip/Direct mode is another coding tool for 3D-AVC. In Skip/Direct mode, a MVP candidate is derived based on predefined derivation order: inter-view candidate and the median of three spatial candidates derived from the neighboring blocks A, B, and C (D is used only when C is unavailable) as shown in FIG. 3. On the decoder side, the motion compensation is performed according to the motion information of that derived MVP candidate. The motion information includes the prediction direction (uni-direction prediction or bi-direction prediction), the reference picture type (temporal prediction, virtual prediction, or inter-view prediction), and the reference picture index. As shown in FIG. 3, the central point (312) of the current block (310) in the dependent view and its disparity vector are used to find the corresponding point in the base view or reference view. After that, the MV of the block including the corresponding point in the base view is used as the inter-view candidate of the current block. The disparity vector can be derived from both the neighboring blocks (A, B and C/D) and the depth value of the central point in ATM 7.0. Specifically, if only one of the neighboring blocks has disparity vector (DV), the DV is used as the disparity. Otherwise, the DV is derived as the median of the DVs (320) of the adjacent blocks A, B, and C. If a DV is unavailable, a DV converted from depth (350) is then used instead. The derived DV is used to locate a corresponding block (340) in the reference picture (330).
In 3D-AVC, during the inter-view MVP derivation process for Skip/Direct mode, the disparity vector (DV) is derived from depth information of a corresponding block when the DV for a candidate neighboring block is not available. The depth-to-DV conversion for Skip/Direct mode in 3D-AVC is shown in FIG. 4, where the DV is determined based on the maximum depth values of the depth sample at four corners (shown in high-lighted small squares) of the associated depth block (a Macroblock, MB in this example). Therefore, the depth-to-DV conversion only needs to be performed once for each MB.
The depth-to-DV conversion to derive a DV for unavailable neighboring blocks in Inter mode is shown in FIG. 5, where the depth-to-DV conversion is performed multiple times for various partitions for the MB. For example, if the current MB is partitioned into 16 4×4 sub-blocks, the depth-to-DV conversion is performed 16 times. For each sub-block, the DV is determined based on the maximum depth values of the depth sample at four corners of the associated depth sub-block.
In 3D-HEVC, inter-view residual prediction (IVRP) has been developed as a new coding tool in order to share the previously encoded residual information of reference views. The inter-view residual prediction is based on a Disparity Vector (DV) derived for the current block (i.e., Prediction Unit, PU). The DV can be derived from the spatial or temporal neighboring blocks of the current block according to 3D-HEVC. Alternatively, a disparity derivation technique based on Motion Compensated Prediction (MCP), named DV-MCP, can also be used to derive an estimated DV. In this case, blocks coded by MCP are also used for the disparity derivation process. When a neighboring block is an MCP coded block and its motion is predicted by interview motion prediction, the disparity vector used for the inter-view motion prediction represents a motion correspondence between the current block and the inter-view reference picture. The block is referred to as a DV-MCP block.
As discussed above, the DV is widely used in three-dimensional coding for various applications. One method to derive the DV is based on depth map. Different depth-to-disparity conversions are being used in the three-dimensional coding standard, such as 3D-HEVC. It is desirable to simplify the depth-to-disparity conversion process while maintaining the performance.