Maps known as depth maps or disparity maps are conventionally used in three-dimensional video (3D video) applications such as display in relief, reconstruction of scenes in three dimensions, and virtual navigation in video images. These maps are obtained by a process of estimation from at least two images of the same scene coming from stereoscopic videos or multiview videos captured by a plurality of video cameras or corresponding to different times in the same video.
Two approaches are generally used in the particular situation of transmitting stereoscopic content. A first approach uses a pair of conventional video cameras positioned so as to reproduce the human visual system, each camera corresponding to one eye. The two monoscopic video sequences captured are transmitted to the user. Another approach transmits only one monoscopic color video sequence accompanied by depth information associated with each pixel. In this situation, one or more virtual views may be synthesized at the user end by means of rendering algorithms based on depth maps. This depth map approach has the particular advantage of reducing the total bandwidth used for transmission and is directly applicable to video coding with data compression.
To define what are conventionally referred to as depth maps and disparity maps, and in order to simplify the explanation, consider the particular situation of binocular stereo vision where two images of the same scene from two different points of view are produced, for example by a video camera. The two images are conventionally referred to as the right-hand image and the left-hand image. In this context, a depth map corresponding to a given image (a right-hand image or a left-hand image, for example) is a digital image each pixel of which is associated with a value representing a color (for example a shade of grey) that is characteristic of the distance of the pixel concerned from the camera.
FIG. 1 shows a depth map in which the distance of the objects appearing in the image relative to the camera that filmed a scene is represented by a grey level from white for the nearest objects to black for the farthest objects. Thus in this example the table and the vase containing flowers, which are of lighter shade, are the nearest objects of the scene (foreground) and the screen represented appears to be the farthest object (background).
FIGS. 2a and 2b represent a pair of stereoscopic images from which a depth map may be estimated. FIG. 2a represents the left-hand image and FIG. 2b represents the right-hand image.
A disparity map embodies the result of stereoscopic matching of the above-mentioned two images. Stereoscopic matching consists in finding in the left-hand and right-hand images pixels that are homologous, i.e. pixels that are the projection of the same entity in the scene. The disparity map is one way of visually representing the results of this matching: each pixel of the disparity map represents the amplitude of the disparity, i.e. the distance between the position of a pixel in the left-hand image and that of its counterpart in the right-hand image. Thus each pixel of the disparity map is associated with a value representing a color characteristic of the amplitude of the disparity. The conventional process again uses shades of grey: for example, the darker the pixel, the smaller the disparity, with completely white pixels representing pixels with no counterparts in one of the two images.
It is easy to demonstrate that the higher the depth value associated with a given pixel of an image, the lower the corresponding disparity value. The depth and the disparity thus being two inversely proportional magnitudes, the present invention may be applied equally to a depth map or a disparity map. In the remainder of the description, depth maps and disparity maps are referred to interchangeably and the term map refers to either of these maps.
The use of depth or disparity maps is of primary importance in the context of emerging techniques such as virtual navigation in a video, display in relief, 3D modeling, and video coding. Depth maps obtained by prediction from different views can also be used for compression applications that perform compression by predicting views by using depth maps. In this type of approach, depth maps are used to predict corresponding views in other videos of the same scene in order to limit the transmission of redundant information, notably with MVD (Multiview Video and Depth) data consisting of a plurality of videos and associated depth maps. Whatever the final application, the accuracy of these maps is therefore critical to the quality of the reconstructed views and to efficient video compression in coding applications.
In particular, the quality of a depth/disparity map is linked to the presence of occluded areas in the map concerned. Occluded areas are areas of the map for which the pixels have no counterpart in one of the images, with a part visible in only one image. These occluded areas are essentially caused by objects in the scene referred to as occluding objects, i.e. objects which, in one of the two images (right-hand or left-hand), mask a particular region of the represented scene that is directly visible in the other image. Occluded areas are essentially found around boundaries caused by depth discontinuities in the maps.
FIG. 3 represents an example of a disparity map estimated using a known algorithm based on graph cuts. To be more precise, the algorithm used is that described for example in the document “Multi-camera scene reconstruction via graph cuts”, V. Kolmogorov and R. Zabih, Proceedings of the European Conference on Computer Vision, 2002.
In the image represented, the white areas are occluded areas detected by the above-mentioned graph cuts algorithm and for which it has not been possible to determine a pixel value because of the lack of counterparts in the left-hand and right-hand images represented by FIGS. 2a and 2b, respectively.
It is therefore necessary to take these occluded areas in the depth or disparity maps into account in order to enhance the quality of the images obtained by a reconstruction or synthesis process based on these maps. It is a question in particular of detecting and/or filling gaps in the occluded areas corresponding to the missing information.
Known techniques for processing the above-mentioned depth or disparity map defects include in particular a first category of techniques that operate on images reconstructed from depth or disparity maps.
Solutions of this first category are described for example in the document “Stereoscopic imaging: Filling disoccluded areas in image-based rendering”, C. Vázquez, W. J. Tam, and F. Speranza, Proceedings of the SPIE Three-Dimensional TV, Video, and Display, Vol. 6392, pp. 0D1-0D12, 2006. According to those solutions, gaps in the reconstructed images are filled by propagating a value obtained from their vicinities. However, techniques in this category that operate on images reconstructed from depth or disparity maps have the disadvantage of exploiting specific features of the depth maps little, if at all. Those maps represent data having features different from textured two-dimensional (2D) images, such as the absence of texture details and the impact on depth of the relative positions of the objects.
A second category of known techniques operates directly on the depth or disparity maps. For example, the document: “Improving depth maps by nonlinear diffusion”, J. Yin and J. R. Cooperstock, Proc. 12th International Conference Computer Graphics, Visualization and Computer Vision, Pizen, Czech Republic, February 2004, describes post-processing applicable to a depth map to enhance the occluded areas or areas with absence of texture. That post-processing is based on filling by non-linear diffusion, i.e. Gaussian smoothing combined with edge detection.
Another technique described in the document “Design Considerations for View Interpolation in a 3D Video Coding Framework”, Yannick Morvan, Dirk Farin, and Peter H. N. de With, 27th Symposium on Information Theory in The Benelux, Vol. 1 p., June 2006, Noordwijk, Netherlands, propagates the background by comparing the depth values of the two valid pixels nearest an occluded pixel.
FIGS. 4a and 4b respectively represent a detail of the disparity map from FIG. 3 and the same detail after filling the occluded areas using a known technique based on graph cuts. The image represented in FIG. 4b thus shows the same detail of the disparity map from FIG. 3 but after filling the occluded areas using an algorithm that takes the image line by line and fills an invalid pixel of a occluded area with the first valid (non-occluded) pixel following the invalid pixel on the current line (the direction along the line is a function of the image with which the depth map is associated). Such a filling algorithm is described for example in the document “Occlusions in motion processing”, Abhijit Ogale, Cornelia Fermuller, and Yiannis Aloimonos, SpatioTemporal Image Processing, March 2004, London, UK.
In the FIG. 4b image obtained after filling, white areas visible to the left of the person, notably around their right hand, correspond to errors produced when filling the occluded areas. These errors are primarily caused either by pixels of an occluded area for which a value has been assigned in error and that therefore disrupt the filling process, or by the propagation of incorrect values during filling, which causes loss of detail, as around the right hand of the person represented in the image, for example.
Techniques that operate directly on depth or disparity maps thus have the disadvantages of not managing errors or artifacts in occluded areas and/or generating a blurred appearance, notably at the edges of the objects represented, because of the processing applied. Moreover, by comparing only the values of the first valid pixels adjacent an invalid pixel, propagation of the background is not certain, especially with concave objects.
Consequently, the above techniques process only partially defects of depth or disparity maps and the artifacts, propagation of incorrect values, and loss of detail that result from defects of these maps, because the presence of occluded areas causes inconsistency in the reconstructed images and during display in relief using images predicted from depth/disparity maps.
There is therefore a real need to improve the quality of disparity and depth maps, notably in respect of their accuracy around contours of objects.