Most representations of 3D video signals called 3D scenes rely on depth maps and disparity maps. Generally, one starts from a set of images of a given 3D scene, corresponding to different points of view, each of them possibly coming with different associated characteristics such as a depth map and a texture. The depth map of a point of view is a grayscale image, each pixel of which contains the distance to a camera filming the scene. When one wants to generate a new point of view, also called more simply view, of the scene, it is possible to compute some areas of it, given another point of view, its depth map, the intrinsic camera parameters and the parameters of the changes undergone by the camera going from this point of view to the new one (displacement, rotation, lens changes). This process is called “Reconstruction”, the new point of view created being called a Reconstructed view (or reconstructed image). If a point P of the scene is visible from both points of view, a translation vector will give its pixel coordinate in the new point of view from its pixel coordinate in the original one. These vectors are called disparity vectors. Projective geometry results, as disclosed in the document “three dimensional computer vision—O. Faugeras MIT Press 1993”, establish a simple relation between disparity vectors of the disparity map and depth values of the depth map.
During transmission of a video signal, multi-view or stereo coding schemes, well known to those skilled in the art, generally encode by compression the textures and the depth maps needed to cover a certain range of points of view. Whereas texture can be encoded by using standard methods, potentially leading to well-known artifacts in the case of lossy encoding, the case of depth (or disparity) encoding is a little more tricky: for an encoded depth map to have a visually similar aspect as the original one does not necessarily mean that it has the same reconstruction properties. In the new view generation process, points or areas could be translated to the wrong place (because of wrong disparity vectors). This would create texture discontinuities that may be more noticeable than what the “visual” aspect of the encoded map suggested. Still, dense depth maps are quite big files and lossy compression is almost unavoidable if one wants to keep depth maps size within a reasonable range (namely less than 20% of texture bit-rate). One therefore has to deal with artifacts and improper depth/disparity values and one must design post-processing after the decoding of the video signal and enhancement algorithms.
Within the MPEG-4 standard, depth map can be encoded using the Multiple Auxiliary Component tools (MAC) (as described in <<Amendment 1: Visual extensions, ISO/IEC JTC 1/SC 29/WG 11 N 3056, December 1999>>), in which they are DCT encoded on a block basis, similarly to a classic luminance image encoding well known to those skilled in the art. No specific treatments of the underlying artifacts are proposed but traditional MPEG tools that, as previously stated, are good for texture but not necessarily for depth maps. Hence, for example, in the texture reconstructed image, this can lead to a fuzzy edge along with isolated texture pixels, two effects that, moreover, are time-inconsistent in the course of following points of view.