This invention relates generally to determining or providing depth in or for three-dimensional images (3D) and, more particularly, to depth estimation for computer vision applications such as 3D scene reconstruction and/or stereo-based object detection.
Depth is considered as one of the most important cues in perceiving the three dimensional characteristics of objects in a scene captured by cameras. In computer vision, the value which represents the distance between each object in the scene to the focal point of the camera is called depth and an image storing these values for all the pixels is referred to as a depth map. Depth maps are essential in a variety of applications such as view synthesis, robot vision, 3D scene reconstruction, interactions between humans and computers, and advanced driver assistance systems. The performance of these mentioned applications is highly dependent on the quality and accuracy of the depth map. Thus, generating an accurate depth map is of substantial importance. The main objective of depth estimation methods is to generate a per-pixel depth map of a scene based on two or more reference images. The reference images are captured by a stereo camera system in which the cameras are parallel to each other or are set with a slight angle.
Depth maps can be estimated by using either stereo matching techniques or depth sensors. With the advent of depth sensors, fusion camera systems have been developed which directly measure the depth in real-time. The measurement of depth in such sensors is usually performed by either using time-of-flight (TOF) systems or infrared pattern deformation. Depth maps acquired by the depth sensors are usually noisy and suffer from poorly generated depth boundaries.
Over the past several years, stereo-based methods which estimate the depth map algorithmically have attracted a lot of attention in the research community. Computation of the shift between the two reference images, also known as disparity, is a main key to determine the depth values in stereo matching techniques.
The stereo matching techniques can be classified into two groups, namely local and global techniques. The local methods generally consider a finite neighboring window to estimate the disparity. Thus, the window size plays an important role in such methods. The local methods are fast and computationally simple but, they are highly error-prone and the estimated depth maps are usually inaccurate. On the other hand, in global techniques an energy function is globally optimized to find the disparity. Global depth estimation techniques can generate high-quality depth maps. Most popular techniques in this category include belief propagation, graph cuts and dynamic programming. However, due to the computational complexity of such algorithms, it is not feasible to exploit them in real-time applications. Combining the concepts of local and global stereo matching methods was first introduced as semi-global matching (SGM). SGM performs pixel-wise matching based on mutual information and the approximation of a global smoothness constraint and a good trade-off between accuracy and runtime is obtained. However, it achieves limited performance under illumination changes. Despite the advantages of different depth estimation techniques, there are several problems in the generated depth maps. The existence of holes and sensitivity to noise and illumination changes are the main significant problems.
Thus there is a continuing need for improved depth estimation techniques.