Three-dimensional (3D) television has been a technology trend in recent years that intends to bring viewers sensational viewing experience. Various technologies have been developed to enable 3D viewing. Among them, the multi-view video is a key technology for 3DTV application among others. The traditional video is a two-dimensional (2D) medium that only provides viewers a single view of a scene from the perspective of the camera. However, the multi-view video is capable of offering arbitrary viewpoints of dynamic scenes and provides viewers the sensation of realism.
The multi-view video is typically created by capturing a scene using multiple cameras simultaneously, where the multiple cameras are properly located so that each camera captures the scene from one viewpoint. Accordingly, the multiple cameras will capture multiple video sequences corresponding to multiple views. In order to provide more views, more cameras have been used to generate multi-view video with a large number of video sequences associated with the views. Accordingly, the multi-view video will require a large storage space to store and/or a high bandwidth to transmit. Therefore, multi-view video coding techniques have been developed in the field to reduce the required storage space or the transmission bandwidth.
A straightforward approach may be to simply apply conventional video coding techniques to each single-view video sequence independently and disregard any correlation among different views. Such coding system would be very inefficient. In order to improve efficiency of multi-view video coding (MVC), typical multi-view video coding exploits inter-view redundancy. Therefore, most 3D Video Coding (3DVC) systems take into account of the correlation of video data associated with multiple views and depth maps. The MVC adopts both temporal and spatial predictions to improve compression efficiency. During the development of MVC, some macroblock-level coding tools are proposed, including illumination compensation, adaptive reference filtering, motion skip mode, and view synthesis prediction. These coding tools are proposed to exploit the redundancy between multiple views. Illumination compensation is intended for compensating the illumination variations between different views. Adaptive reference filtering is intended to reduce the variations due to focus mismatch among the cameras. Motion skip mode allows the motion vectors in the current view to be inferred from the other views. View synthesis prediction is applied to predict a picture of the current view from other views.
In the MVC, however, the depth maps and camera parameters are not coded. In the recent standardization development of new generation 3D Video Coding (3DVC), the texture data, depth data, and camera parameters are all coded. Due to existence of the depth data and camera parameters in the new-generation 3D Video Coding (3DVC) technology, the relationship between the texture images and depth maps need to be studied to further improve compression capability. The depth maps and texture images have high correlation since they all correspond to the same geometry. The redundancy between the texture data and the depth data can be exploited via the corresponding correlation. For example, the depth maps may help the texture image compression with higher coding gain or less coding time. Furthermore, the depth maps can be converted to present the correspondence pairs in the texture images, which benefits inter-view prediction process.
In 3D video coding, the coding order of texture data and depth data is always an issue because of the redundancy between texture and depth. During the early standard development of 3D video coding, the coding order is chosen as coding the depth data before the texture data in the dependent views for AVC-based 3D video coding (3D-AVC). However, the coding order is chosen as coding the texture data before the depth data in the dependent views for HEVC-based 3D video coding (3D-HEVC). A technique to allow flexible coding order has been disclosed in the literature that changes the coding order for 3D-HEVC. The coding efficiency for texture can be improved by referring to the depth information. The depth information helps to improve the coding efficiency in many different ways. Depth-based motion vector prediction (DMVP) in 3D-AVC uses the coded depth to improve the accuracy of the motion vector prediction. The depth map helps to identify inter-view candidate for motion parameter prediction. View Synthesis Prediction (VSP) is a popular topic which identifies the inter-view reference from the frames warped from other views. In view synthesis prediction, the texture data and depth data of a first view are coded/decoded first. A second view can be predicted by warping the first view to the second view position. The depth map helps the texture picture to be warped to a correct position.
Coding tools such as motion vector inheritance and inter-view motion parameter prediction further utilize the depth information to improve the prediction of the current block. In the recent development of 3DVC standard, a predicted depth map (PDM) algorithm has been disclosed to provide depth information for current texture image. The predicted depth map is derived from coded disparity vector or warped from the depth map of the coded view. The current block can be coded by the motion parameters derived from the predicted depth map. A neighboring block disparity vector (NBDV) has been disclosed in the literature. NBDV uses the disparity vector (DV) from inter-view prediction in the neighboring block to improve the motion vector prediction. While NBDV replaces the earlier method based on PDM, however, depth information is still proved to be useful in 3D-AVC. Since the depth information can be useful for improving the coding efficiency, the method to retrieve depth information for the current block becomes important. It is desirable to develop a method to generate virtual depth information for three-dimensional video coding.