Depth Images
Depth images represent distances from a camera to scene elements in 3D space. Efficient encoding of depth images is important for 3D video and free view television (FTV). FTV allows user to interactively control the view and generate new virtual images of a dynamic scene from arbitrary 3D image points.
Most conventional image-based rendering (IBR) methods use depth images, in combination with stereo or multi-image videos, to enable 3D and FTV. The multi-image video coding (MVC) extension of the H.264/AVC standard supports inter-image prediction for improved coding efficiency for multi-image videos. However, MVC does not specify any particular encoding for depth images.
Efficient estimation and encoding of depth are crucial to enable high-quality virtual image synthesis at the decoder.
Depth Reconstruction Filter
Unlike conventional images, depth images are spatially monotonous except at depth discontinuities. Thus, decoding errors tend to be concentrated near depth discontinuities, and failure to preserve the depth discontinuities leads to the significantly compromised qualities of virtual images, see FIGS. 6A-6B.
Down/Up Sampler
Encoding a reduced resolution depth can reduce the bit rate substantially, but the loss of resolution also degrades the quality of the depth map, especially in high frequency regions such as at depth discontinuities. The resulting image rendering artifacts are visually annoying. Conventional down/up samplers either use a low-pass filter or an interpolation filter to reduce the quality degradation. That is, the conventional filters combine the depths of several pixels covered by the filter in some way for each filtered pixel. That filtering “smears” or blurs depth discontinuities because it depends on multiple depths.
Because the depth video and image rendering results are sensitive to variations in space and time, especially at depth discontinuities, the conventional depth reconstruction are insufficient, especially for virtual image synthesis.