One way to convey information about the structure of a three dimensional (3D) visual scene is to augment a color two dimensional (2D) image with information about the visual scene's geometry. The geometry information may be conveyed using a gray-scale image referred to as a “depth map” that shows the distance between each color pixel and its corresponding scene point in 3D space. The combination of the 2D color image with the depth map can be referred to as a “view-plus-depth,” “RGB-Z,” or “2D-plus-depth” representation, and is described in greater detail in Dimenco, B. V., “3D Content Creation Guidelines,” 3D Interface Specification, www.dimenco.eu (2011), Mueller K., Merkle P., Wiegand T., “3-D video representation using depth maps,” in Proceedings of the IEEE, Special Issue on “3D Media and Displays,” invited paper, 99(4), pp. 643-656 (2011), and Ndjiki-Nya, P. et al., “Depth Image-based Rendering with Advanced Texture Synthesis for 3D Video,” in IEEE Transactions on Multimedia, 13(3), pp. 453-465 (2011).
In a 2D-plus-depth representation, the depth map is stored as a greyscale image side-by-side with the color image, so that each pixel of the 2D color image has a corresponding depth value stored in the same pixel position on the corresponding depth map. This sort of scene representation may be used to render perspective views on 3D displays (e.g., auto-stereoscopic 3D displays) or for 3D view generation by Depth-Image-Based Rendering (DIBR) methods, as detailed by X. Yang in “DIBR based view synthesis for free-viewpoint television”, published in 3DTV conference: The True Vision—Capture, Transmission and Display of 3D Video (3DTV-Con), 2011. Depth map data is created by techniques such as “structure-from-stereo” or as sensed by dedicated range (depth) sensors employing, for example, Time-of-Flight sensing principles, as described in Kolb, A., Barth, E., Koch, R., and Larsen, R., “Time-of-flight cameras in computer graphics,” Computer Graphics Forum 29(1), 141-159 (2010).
The biggest drawback of the 2D-plus-depth data format is its inefficient use of pixel bit budget, because the use of two aligned images doubles the required number of bits. For example, where sx represents the pixel width of an image and sy represents the pixel height, a high-definition (HD) image will require a pixel size (Spixels) as shown in Equation 1 below:Spixels=2(sxsyc)=2(1920×1080×3)=12,441,600  (1)Accordingly, it is apparent that such a large amount of data requires the use of data compression.
The depth maps in 2D-plus-depth representations are textureless and piece-wise smooth, so predictive and arithmetic coding are attractive alternatives. However, it is important in case of lossy compression that depth data is considered as a mapping function (a function that establishes some geometry within the scene) rather than as just a gray-scale intensity image. In the latter case, a direct lossy compression of the depth image could limit the number of applications that can utilize the compressed depth maps. Accordingly, a proper compression that does not distort significantly the mapping is required.