Depth maps provide information for providing video data that extends beyond mere two-dimensional display. Various types of sensors are used to obtain depth maps. Example sensors include Time-of-Flight cameras (ToF), real-time infrared projectors and cameras (e.g. Microsoft Kinect), or stereo vision systems. Typically, the quality and resolution of the acquired depth information is not comparable to the analogous color images obtained from standard cameras. Hardware limitations and estimation errors are most often responsible for the lesser quality of the acquired depth information.
Others have considered ways to improve depth maps, most often through depth map upsampling and/or refinement. Most previous techniques suffer from artifacts, such as texture copying and edge blurring. Texture copying occurs in smooth areas with noisy depth data and textures in the color image, while edge blurring occurs in transition areas if different objects (located in different depth layers) have similar color. The following methods suffer from some or all of these drawbacks.
The seminal work in the study of the depth map upsampling problem is described by Diebel et al, “An application of Markov Random Fields to Range Sensing,” NIPS pp. 291-298, MIT Press (2005). This technique assumes that discontinuities in range and color tend to co-align. The posterior probability of the high-resolution reconstruction is designed as a Markov Random Field (MRF) and it is optimized with the Conjugate Gradient (CG) algorithm.
Kopf et al described a technique known as Joint Bilateral Upsampling (JBU). See, “Joint Bilateral Upsampling,” ACM SIGGRAPH '07 papers, New York, N.Y., USA, (2007). The Joint Bilateral Upsampligh approach leverages a modified bilateral filter. The technique upsamples a low-resolution depth by applying a spatial filter to it, while jointly applying a similar range filter on the corresponding high-resolution color image.
Yang et al. describe an upsampling method based on bilateral filtering the cost volume with sub-pixel estimation. See, Yang et al, “Spatial-Depth Super Resolution for Range Images,” in Computer Vision and Pattern Recognition, CVPR '07. IEEE Conference on, June 2007, pp. 1-8. This technique builds a cost volume of depth probability and then iteratively applies a standard bilateral filter to it. A final output depth map is generated by taking the winner-takes-all approach on the weighted cost volume. Finally, a sub-pixel estimation algorithm is applied to reduce discontinuities.
The artifacts that can result in these approaches remain a concern. More recently, Garcia et al. have described a Pixel Weighted Average Strategy (PWAS) that seeks to resolve artifacts. See, Garcia et al, “Pixel Weighted Average Strategy for Depth Sensor Data Fusion,” Image Processing (ICIP), 2010 17th IEEE International Conference, pp. 2805 2808 (September 2010). This weighted average strategy builds multi-lateral upsampling filters. The multi-lateral filters are an extended joint bilateral filter with an added credibility factor. The credibility factor takes into account the low reliability of depth measurements along depth edges and the inherent noisy nature of real-time depth data. As a further improvement upon PWAS, Adaptive Multi-lateral Filtering (AMF) has been described as improving accuracy within smooth regions. Garcia et al, “A New Multi-Lateral Filter for Real-Time Depth Enhancement,” Advanced Video and Signal-Based Surveillance (AVSS), 2011 8th IEEE International Conference Sep. 2 2011, pp. 42 47. These weighted average methods solve the texture copying and edge blurring problems, but have performance that is very sensitive to the window size of the filter used, making the window size a critical parameter in application of the techniques. One filtering scheme is employed to enhance depth maps, and each depth value in the depth map is computed by averaging depth values from neighborhood with adaptive weights. With these methods, having a window size that is too large can cause boundary blurring and lose details of complex objects. On the other hand, having a window size that is too small can cause a failure to collect significant information from the neighborhood of the window. The sensitivity to window size will produce varied results depending upon the particular information obtained by the sensor when constructing a refined depth map, or requires additional complexity to determine and adapt the window size.