1. Technical Field
Embodiments of the present description relate to the estimation of depth maps.
2. Description of the Related Art
In computer vision, a typical problem relates to the estimation of a depth map from at least two images of the same object obtained from different views. Usually, during the depth map estimation process, a depth value is associated with each pixel of the depth map. For example, the values of the depth map may be represented as a grayscale image.
For example, FIGS. 1a and 1b illustrate respectively an example of a left and right image and FIG. 1c shows a possible depth map in grayscale.
Substantially, FIGS. 1a and 1b correspond to the well-known “cone” reference images of the “2003 Stereo datasets with ground truth”, which have been created by Daniel Scharstein, Alexander Vandenberg-Rodes, and Richard Szelisk. The original versions of the images are published, e.g., in Daniel Scharstein and Richard Szeliski, “High-accuracy stereo depth maps using structured light”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2003), volume 1, pages 195-202, Madison, Wis., June 2003.
Various methods exist in the prior art to obtain a depth map. For this reason, a research group of the Middlebury College has defined a taxonomy method which permits to classify different solutions for generating depth map images, see, e.g., Daniel Scharstein and Richard Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms”, International Journal of Computer Vision, 2002, Vol. 47, pages 7-42. The above mentioned article from Scharstein et al. provides also a general overview of prior-art methods, which may be classified as local methods, global methods, dynamic programming and cooperative algorithms.
FIG. 2 illustrates a block diagram, which shows the major steps of typical methods of estimating depth maps.
Substantially, most methods include a set-up phase 100, a matching phase 102, a filtering phase 104 and a refinement phase 106. For example, the set-up phase 100 may include a conversion of the original images from color images to grayscale images, such as a RGB to grayscale conversion 1002, a rescaling of the images 1004, a noise filtering 1006, a rectification 1008, a feature extraction 1010 and/or color segmentation 1012. The matching phase 102 may include a matching cost computation step 1022 and a cost (support) aggregation step 1024. The filtering phase 104 may include a disparity computation step 1042 and a disparity optimization step 1044. Finally, the refinement phase 106 may include a disparity refinement step 1062.
One of the known methods for generating depth images is called “block matching”. Specifically, in block matching, a square of pixels is used for each pixel (reference pixel) in the reference image, such as the left image, and compared with possible squares in the second image, e.g., the right image, to find the best association between both center pixels. Usually, not all squares in the second image are used, but the search may be limited only to a subset of squares, such as the squares with the same vertical coordinates, e.g., the same row, as the reference pixel. In this case, the differences between the horizontal coordinates, e.g., columns, provides the disparity and the depth may be calculated, e.g., as the inverse of the disparity.