Depth maps are images (or videos if taken at regular time interval) that record the distances of observable scene points from the optical point of a camera. They provide additional information to the associated color pixels in the color image or video taken at the same position by specifying their depths in the scene. One application of depth maps is to synthesize new views of the scene from the color image or videos (also referred to as texture). Depth maps can also be taken at adjacent spatial locations to form multi-view depth images or videos. Together with the texture or color videos, new virtual views around the imaging locations can be synthesized. See, S. C. Chan et al., “Image-based rendering and synthesis,” IEEE Signal Processing Magazine, vol. 24, pp. 22-33, (2007) and Z. Y. Zhu et al., “Object-based rendering and 3D reconstruction using a moveable image-based system,” IEEE Trans. Circuits Syst. Video Techno., vol. 22 (10), pp. 1405-1419, (2012), both of which are incorporated herein by reference in their entirety. An example of a stereo view of a synthetic scene and its associated depth maps are shown in FIG. 1.
Depth maps are important in many applications, especially for generating multiple new views from color (or texture) videos for view synthesis and display of 3D content in stereo and autostereoscopic displays. State-of-the-art coding algorithms such as HEVC-3D usually rely on block-based motion estimation and compensation techniques using both the depth and texture videos for inter-frame and inter-view prediction. See, G. Tech et al., “3D-HEVC draft text 1,” in Proceedings of the 5th Meeting of Joint Collaborative Team on 3D Video Coding Extensions (JCT-3V), Document JCT3 V-EIOOJ, Vienna, Austria, and (August 2013).
The camera position is not explicitly estimated for compensation and realization of such global camera motion prediction/compensation usually requires considerable multiplications which makes real-time implementation complicated. It is important to be able to obtain a set of motion descriptors of the stationary and major moving objects in the scene from adjacent depth images over time (inter-frame) and/or over space (intra-view) under changes in camera position and focus. The improved prediction due to the global camera motion compensation and major object motion parameters will result in reduced prediction residuals to be encoded, and hence better coding efficiency. This also greatly reduces the bits required for coding motion vectors in macroblocks or prediction unit since only short local motion correction is required given the global motion predictors, which are coded just only once.
Efficient compression of multi-view depth map images and videos are therefore important for their efficient storage and transmission.
State-of-the-art coding algorithms as in the Tech article usually rely on block-based motion estimation and compensation techniques using both the depth and texture videos. The camera position is not explicitly estimated for compensation and realization of such global camera compensation usually requires considerable multiplications which makes real-time implementation complicated. It is important to be able to efficiently obtain a set of motion descriptors of the stationary and moving objects in the scene from adjacent depth images over time (inter-frame) and/or over space (inter-view) under changes in camera position and focus. By using these motion model parameters, the bits for coding the prediction residual and the additional motion vectors in each macroblocks or coding units can be greatly reduced, which improve the coding efficiency.