The analysis of scenes in images (such as image segmentation, background subtraction, automatic object recognition and multiclass detection) is a field that has been widely covered in the literature, mainly for “single-sensor” (2D) images. Benefiting from the latest advances in 3D perception, scene analysis also attempts to make use of depth information, since an object is not only a coherent visual unit in terms of color and/or texture, but also a spatially compact unit.
Multiple types of 3D perception system are known:                Equipment such as 3D scanners or time-of-flight (TOF) cameras. This type of 3D sensor provides a depth image in which each pixel corresponds to the distance between a point of the scene and a specific point. The depth images obtained are generally quite precise, but they nonetheless include aberrations (for example “speckle” in the case of TOF cameras). They are expensive, from a thousand to several thousand euros, limiting their use to applications in which cost is not a main obstacle. Moreover, a number of these 3D sensors cannot be used in real-time applications due to the low frequency of the images.        Stereoscopic systems, generally consisting of an assembly of cameras and/or projectors, in combination with specific processing operations (for example disparity computation). These benefit from the lower cost of standard cameras, or even cameras that may already be present for other applications (for example the reversing camera function). However, these images are noisier (sensitivity to lighting conditions, problems with lightly textured surfaces, etc.) and the depth image deduced from the disparity map is not dense. The non-linear transformation {disparity map→depth map} exhibits a non-uniform information density in the depth map. Typically, data close to the camera will be denser, and data on the object boundary will potentially be imprecise.        
The quality of the depth image or of the disparity image has a substantial impact on the performance of processing operations performed on this image. In the case of stereoscopic images, substantial errors in the depth image are even more detrimental to the processing operations performed.
Thus 3D scene analysis systems (for example scene segmentation) are either expensive or negatively affected by errors present in the depth map.
A filtering of the data linked to the depth may be performed on the disparity map. Aberrant errors are conventionally treated by median filters. The only parameter of this filter is the size (or the shape) of the support. 3*3 or 5*5 square supports are typically used.
While noise removal capability increases with the size of the support, this is nonetheless accompanied by the removal of details, along with the potential displacement of edges in the presence of noise. In the context of segmentation, this can lead to imprecise segmentation, and it should be noted that this effect is not uniform across the depth image or across the disparity image.
However, using a small support decreases the filtering capability. If the level of noise is statistically significant, the filtering thereof will only be partial.
Thus, the choice of filter size is a trade-off between the removal of aberrations and image deformation. This choice is left up to the user, and there is no method for automatically determining an “optimum” value.
In the article entitled “Rapid 3D object detection and modeling using range data from range imaging camera for heavy equipment operation” by Son, Kim & Choi, published in “Automation in Construction” Vol. 19, pp. 898-906, Elsevier, 2010, the authors present a 3D scene segmentation system, consisting of a time-of-flight camera and processing software including successive steps for decreasing noise in depth images, subtracting ground elements, segmenting objects and creating volumes surrounding objects. The limits of such an approach are that the system requires a time-of-flight camera, which is an expensive device, and the filtering operations are adapted to the type of noise linked to the sensor. The filtering uses fixed supports, without considering the local characteristics of the signal: a 3*3 mean difference filter combined with a fixed threshold of 0.6 for filtering aberrant values of “dropout” type (a wave that has not been received by the sensor) and a 3*3 median filter for correcting speckle noise. Furthermore, as mentioned above, a fixed support size and a fixed threshold do not allow the trade-off between filtering/preservation of the signal to be optimized according to the local and actual characteristics of the signal, in particular those linked to the geometry of a 3D approach. Lastly, the global approach to segmentation uses a dense 3D mesh allowing fine segmentation, but its computing time, of the order of one second, remains long.
In patent application EP 2541496 (A2) “Method, medium, and apparatus for filtering depth noise using depth information” by Samsung Electronics, a method for filtering depth noise may carry out spatial or temporal filtering according to the depth information. In order to carry out spatial filtering, the method is able to determine a characteristic of the spatial filter on the basis of depth information. Likewise, in order to carry out temporal filtering, the method is able to determine a certain number of frames of reference on the basis of depth information. Although this solution adapts the size and the coefficient of the filter to be applied according to the depth of the region to be processed, it still has drawbacks including, inter alia, the characteristics of the filter not taking account of the distance of objects from the optical center of the camera.
In patent application WO 2013079602 (A1) “Spatio-temporal disparity-map smoothing by joint multilateral filtering” by Kauff P. et al. a filter structure intended to filter a disparity map D(p, t0) comprises a first filter, a second filter and a filter selector. The first filter is intended to filter a specific section of the disparity map according to a first measure of central tendency. The second filter is intended to filter the specific section of the disparity maps according to a second measure of central tendency. The filter selector is provided in order to select the first filter or the second filter in order to filter the specific section of the disparity map, the selection being based on at least one local property of the specific section. This approach, which only works on the disparity map, is dependent on the selection of a fixed threshold for the filter of choice, which is not consistent with physical or geometrical reality.
Thus, there exists no solution in the prior art that allows the quality of a depth image, and consequently that of subsequent processing, to be enhanced while maintaining a low system cost.
Furthermore, there exists no known approach that takes account of the geometrical reality of the operations performed on the original light signal.
There is a need then for a solution that overcomes the drawbacks of the known approaches. The present invention addresses this need.