In three-dimensional video (3DV), videos include texture images acquired by cameras at different configurations, and associated depth images. The per-pixel depths in the depth images enables synthesis of virtual images for selected viewpoints via depth-image-based rendering (DIBR), see MPEG Video and Requirement group, “Call for proposals on 3D video coding technology,” Tech. Rep., MPEG, 2011 MPEG N12036, and Tanimoto et al., “View synthesis algorithm in view synthesis reference software 2.0 (VSRS2.0),” Tech. Rep., MPEG, 2009, MPEG M16090.
Depths are typically acquired by a ranging device, such as time-of-flight sensors. Alternatively, the depths can be estimated from the texture images using triangulation techniques.
In many 3DV applications, it is imperative that the quality of the virtual images for synthesized views is comparable to the images in the acquired video. However, the rendering quality typically depends on several factors, and complicated interactions between the factors.
In particular, texture and depth images often contain errors. Herein, errors, which degrade the quality, are generally characterized as noise. Noise includes any data that do not conform with the acquired video of the scene. The errors can be texture and depth errors.
The errors can be due to imperfect sensing or lossy compression. It is not clear how these errors interact and affect the rendering quality. Unlike the texture errors, which cause distortion in luminance and chrominance level, the depth errors cause position errors during the synthesis, and the effect is more subtle.
For example, the impact of the depth errors can vary with the contents of the texture images. Simple texture images tend to be more resilient to depth errors, while complex texture images are not. The impact of depth errors also depends on the camera configuration, as this affects magnitudes of the position errors. Along the rendering pipeline, depth errors are also transformed in different operations complicating an understanding of the effects.
An accurate analytical model to estimate the rendering quality is very valuable for the design of 3DV systems and methods. As an example, the model can help understand under what conditions reducing the depth error would substantially improve the synthesis output. Then, 3DV encoders can use this information to determine when to allocate more bits to encode the depth images.
As another example, the model can be used to estimate how much improvement can be achieved by reconfiguring the cameras, e.g., closer to each other, given other factors such as the errors in the texture images.
One model is based on an analysis of the rendering quality of image-based rendering (IBR), and uses Taylor series expansion to derive an upper bound of the mean absolute error (MAE) of the view synthesis.
An autoregressive model estimates the synthesis distortion at the block level and is effective for rate-distortion optimized mode selection. A distortion model as a function of the position of the viewpoint is also known for bit allocation.