Depth measurement camera systems are recent range finding measurement devices which have become more popular due to technologies used for gesture recognition and human skeletal tracking in consumer electronics systems and in console games.
Mainly, there are two types of environment lighting independent depth sensing or three-dimensional (3D) camera technologies that are suitable for such applications. One type of 3D camera technology is the structured light 3D camera, for example, provided by PrimeSense, used for gesture recognition in Microsoft's Kinect for Xbox 360 (known as Kinect) video game console. (Microsoft, Kinect, and Kinect for Xbox 360 are trademarks of the Microsoft Corporation.) A second type of 3D sensing camera technology is the time-of-flight (ToF) camera developed and manufactured by several independent companies and which is used, for example, in the automotive industry or for gesture recognition and human skeletal tracking in various environments comprising human to machine interactions, such as in video games, robotic, home automation etc.
However, regardless of the type of 3D sensing camera, an image of a scene is provided that comprises a plurality of pixels, each pixel of the image containing at least information relating to the distance of the imaged object to the camera, such information being the depth value measured. Such an image embedding at least depth measurement information is termed a “depth map”. Other types of images may also include embedded depth measurement information, for example, a “3D point cloud” data matrix where images include embedded information with respect to a camera coordinate system or with respect to a virtual environment coordinate system. In such images, x and y correspond respectively to the horizontal and vertical axis and the z-axis corresponds to the depth. Transformation from a camera coordinate system to a virtual environment coordinate system is a matter of projections, and, such transformations are generally referred to as “scene calibration”.
An article entitled “Boundary Artifact Reduction in View Synthesis of 3D Video: From Perspective of Texture-Depth Alignment”, Yin Zhao et al., IEEE Transactions on Broadcasting, IEEE Service Center, Piscataway, N.J., US, Vol. 57, No. 2, 1 Jun. 2011, pages 510-522, discloses a method in which boundary artefacts present in a view synthesised depth map are corrected using a process termed suppression of misalignment and alignment enforcement (SMART). The process requires the use of both depth information and texture in a pre-processing step to provide hole filling in a virtual view created from at least two stereoscopic images. For a foreground-background boundary, derivatives are obtained using two pixels either side of the pixel to be corrected and the values are compared to a threshold value to determine if the pixel falls within the foreground or the background. Distances between edge points and depth edge points are averaged to provide a smooth curve parallel to the depth or texture edge.
In an article entitled “Spatial-Depth Super Resolution for Range Images”, Qingxiong Yang et al., CVPR '07, IEEE Conference on Computer Vision and Pattern Recognition, 18-23 Jun. 2007, Minneapolis, Minn., USA, IEEE Piscataway, N.J., US, Vol. 57, No. 2, 1 Jun. 2007, pages 1-8), a post-processing technique is described in which a 3D volume of depth probability (referred to as the cost volume) is processed by iteratively applying a bilateral filter to slices of the cost volume to generate a new cost volume which is then used to refine the depth resolution for general two-view stereo vision problems. The steps of the post-processing technique include up-sampling of a low-resolution depth map to the same size as a high-resolution camera image, building a cost volume based on the up-sampled depth map, and applying a bilateral filter to slices of the cost volume to generate a new cost volume, a refined depth map is based on the new cost volume.
An article entitled “Robust Feature-Preserving Mesh Denoising Based on Consistent Subneighborhoods”, Hanqi Fan et al., IEEE Transactions on Visualization and Computer Graphics, IEEE Service Center, Los Alamitos, Calif., US, Vol. 16, No. 2, 1 Mar. 2010, pages 312-324) discloses a method of identifying piecewise smooth sub-neighbourhoods using a density-based clustering algorithm. An initial estimate of vertex normals and curvature tensors is determined by fitting a quadric model which is then filtered to smooth the normal field and curvature tensor field. A second bilateral filtering is then used to preserve curvature details and alleviate volume shrinkage during denoising.
In an article entitled “Temporal Consistency Enhancement on Depth Sequences”, Deliang Fu et al., Picture Coding Symposium 2010; Nagoya, 8 Dec. 2010) discloses a depth filtering algorithm to remove temporal inconsistencies in depth sequences.
U.S. Pat. No. 6,577,307 describes an anti-aliasing process in which a weighting value is used to blend foreground colour with the nearest background colour. The weighting value for each pixel indicates the percentage of coverage of that pixel.
Any application or system that makes use of images providing depth measurements is then dependent on measurement quality in terms of resolution, noise, accuracy, robustness and repeatability. In particular, when mainly considering 3D ToF camera technologies, depth measurements around scene object edges are known to demonstrate convolution and/or interpolation artefacts also termed “flying pixels” which may affect depth data in at least one-pixel radius for a single naturally sharp edge. Such “flying pixels” are spatial artefacts independent from any potential motion blur at occurring in locations at edges of an object, and need to be removed and/or restored to a correct location in the scene which corresponds to a newly computed depth value, the newly computed depth value properly assigning the “flying pixel” to either the foreground object or to the background object. The aim of such restoration is to improve significantly subsequent object detection confidence and enhance depth information quality of objects within the 3D scene.