In 3D video processing depth data is usually represented as a set of depth maps that correspond to each frame of the texture video. The intensity of each point of the depth map describes the distance from the camera of the visual scene represented by this point. Alternatively, a disparity map may be used, wherein the disparity values are inversely proportional to the depth values of the depth map.
In 3D video coding, a depth map for each view needs to be encoded besides the conventional video data, which is also referred to as texture data or texture information. To preserve backward compatibility for non-3D codecs, in existing 3D video codecs the texture data for the base-view is encoded first. The order of coding remaining components can be adjusted. Currently, there are two main coding orders utilized: texture-first and depth-first, which both provide opportunity to exploit the inter-component dependencies, i.e. the dependency between the texture component and the corresponding depth component or disparity component, to increase the overall coding performance of the 3D video codecs. Texture-first coding order enables advanced texture dependent coding tools to be used for coding the depth data. On the other hand, depth-first coding order enables advanced depth dependent coding tools for texture coding.
In a future standard for 3D video coding, called 3D extension of high efficiency video coding (3D-HEVC), G. Tech, K. Wegner, Y. Chen, S. Yea, “3D-HEVC test model 2”, document of Joint Collaborative Team on 3D Video Coding Extension Development (JCT3V), JCT3V-B1005, October, 2012, currently the texture-first coding order is used in the common test conditions (CTC). In a further future standard for 3D video coding, called 3D advanced video coding (3D-AVC), “3D-AVC Draft Text 6”, JCT3V-D1002, Incheon, Republic of Korea, April 2013, currently the depth-first coding order is used in the CTC.
The combined coding of 3D videos is an important research field with the goal to exploit inter-component dependencies to increase the overall coding performance. Both directions (texture-to-depth and depth-to-texture) are possible and may result in improving the overall coding efficiency utilizing inter-component dependencies.
In P. Merkle, C. Bartnik, K. Muller, D. Marpe, T. Wiegand, “Depth Coding Based on Inter-Component Prediction of Block Partitions”, Picture Coding Symposium, Krakow, Poland, May 2012, the already coded texture information of the same view is used to generate a segmentation mask, which is used to predict the collocated depth block in intra-predicted blocks. For each of the two segments of the resulting binary segmentation mask a depth coding (DC) prediction value is derived. This shape prediction from texture to depth shall improve the prediction quality and especially the location accuracy of depth discontinuities.
A similar concept was proposed in “Description of 3D Video Technology Proposal by Fraunhofer Heinrich Hertz Institute (Fraunhofer HHI) (HEVC compatible; configuration A)”, document, M22570, November 2011, Geneva, Switzerland, where wedgelet and contour partitioning for depth map coding was introduced.
Furthermore, methods to utilize a high correlation between texture and depth components in inter prediction were proposed. Reusing already coded motion information (i.e. motion vectors and reference picture indices) of the texture view to reduce the required bitrate of the same view's depth component was proposed in M. Winken, H. Schwarz, T. Wiegand, “Motion Vector Inheritance for High Efficiency 3D Video Plus Depth Coding”, Picture Coding Symposium, Krakow, Poland, May 2012. In that approach, the motion vector information and also the partitioning of the prediction units can be inherited from the collocated texture block when coding a depth block.
In Joint Collaborative Team on 3D Video Coding Extension Development (JCT3V) of International Telecommunication Union Telecommunication Standardization Sector (ITU-T) Video Coding Experts Group or Visual Coding Experts Group (VCEG) and International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) moving picture experts group (MPEG), “3D-CE3.h: Depth Quadtree Prediction for 3DHTM 4.1,” JCT3V-B0068, Technical Report, October 2012, the authors propose to limit the block partitioning (i.e. depth of the coding quad-tree) for the depth component to the corresponding texture quad-tree. By this limitation it is possible to save bitrate for the splitting flag in the depth component, but also introduces a parsing dependency between the two components.
Synthesizing an additional prediction signal for the dependent texture views based on the already coded depth information is proposed by “Description of 3D Video Coding Technology Proposal by Nokia”, document M22552, November 2011, Geneva, Switzerland and C. Lee, Y.-S. Ho, “A Framework of 3D Video Coding Using View Synthesis Prediction”, Picture Coding Symposium, Krakow, Poland, May 2012. Here, contents of the encoded block (pixel values) are synthesized from the reference texture view using Depth Image-Based Rendering technique that requires depth to properly map the pixel positions between the views.
In “Description of 3D Video Technology Proposal by Fraunhofer HHI (HEVC compatible; configuration A)”, document M22570, November 2011, Geneva, Switzerland and “Technical Description of Poznan University of Technology Proposal for Call on 3D Video Coding Technology”, document M22697, November 2011, Geneva, Switzerland, candidates for prediction of motion information from the reference view that are used to encode a currently coded block are derived based on the depth values associated with the coded block.
A similar approach was proposed in “Description of 3D Video Technology Proposal by Fraunhofer HHI (HEVC compatible; configuration A)”, document M22570, November 2011, Geneva, Switzerland to predict the residuum from the already encoded reference view. Based on the depth estimate, a disparity vector is determined for a current block and the residual block in the reference view that is referenced by the disparity vector is used for predicting the residual of the current block.
In “Depth-based Weighted Bi-Prediction for Video Plus Depth Map Coding”, International Conference on Image Processing (ICIP) 2012, September 2012, merging of bi-directional inter prediction results for the coded block is done using weights which values are computed based on depth information. Different methods for calculating weights are proposed, including binary assignment to one or another area of the block.