A depth map is a map of the distance from objects contained in a three dimensional spatial scene to a camera lens acquiring an image of the spatial scene. Determining the distance between objects in a three dimensional spatial scene is an important problem in, but not limited to, auto-focusing digital and video cameras, computer/robotic vision and surveillance.
There are typically two types of methods for determining a depth map: active and passive. An active system controls the illumination of target objects, whereas a passive system depends on the ambient illumination. Passive systems typically use either (i) shape analysis, (ii) multiple view (e.g. stereo) analysis or (iii) depth of field/optical analysis. Depth of field analysis cameras rely of the fact that depth information is obtained from focal gradients. At each focal setting of a camera lens, some objects of the spatial scene are in focus and some are not. Changing the focal setting brings some objects into focus while taking other objects out of focus. The change in focus for the objects of the scene at different focal points is a focal gradient. A limited depth of field inherent in most camera systems causes the focal gradient.
In one embodiment, measuring the focal gradient to compute a depth map determines the depth from a point in the scene to the camera lens as follows:
                              d          o                =                  fD                      D            -            f            -                          2              ⁢              k              ⁢                                                          ⁢              r              ⁢                                                          ⁢                              f                number                                                                        (        1        )            where f is the camera lens focal length, D the distance between the image plane inside the camera and the lens, r is the blur radius of the image on the image plane, k is a scale factor, and fnumber is the fnumber of the camera lens. The fnumber is equal to the camera lens focal length divided by the lens aperture. Except for the blur radius, all the parameters on the right hand side of Equation 1 are known when the image is captured. Thus, the distance from the point in the scene to the camera lens is calculated by estimating the blur radius of the point in the image.
Capturing two images of the same scene using different apertures for each image is a way to calculate the change in blur radius. Changing aperture between the two images causes the focal gradient. The blur radius for a point in the scene is calculated by calculating the Fourier transforms of the matching image portions and assuming the blur radius is zero for one of the captured images.