In many applications of image capture, it can be advantageous to determine the distance from the image capture device to objects within the field of view of the image capture device. A collection of such distances to objects in an imaged scene may be referred to as a depth map. A depth map of an imaged scene may be represented as an image. In the depth map, the distance to objects corresponding to each pixel of the depth map is represented by a greyscale or colour value.
A depth map can be useful in the fields of photography and video, as a depth map enables several desirable post-capture image processing capabilities. For example, a depth map can be used to segment foreground and background objects to allow manual post-processing, or the automated application of creative visual effects. A depth map can also be used to apply depth-related visual effects, such as changing the background scene of an image or a video.
Depth estimation may be performed by depth from defocus (DFD) using a single camera by capturing two or more images with different focus or aperture settings and analysing relative blur between corresponding tiles of the images. Depth from defocus is a flexible method because the depth from defocus method uses a single standard camera without special hardware modifications. The same camera can be used for image or video capture and also for depth capture.
Existing depth from defocus methods typically impose restrictions on the camera settings. The restrictions ensure that the captured images will have a large depth of field (DoF) so that different degrees of relative blur are produced over the entire depth range covered by a scene. For instance, in FIG. 3, an image pair 310, 320, each with a large DoF, is used to produce a DFD depth map 330 of relative depth in which higher intensity represents smaller depth. While the depth map 330 exhibits small patches of depth errors, such as errors 340 and 350, the entire range of the depth of the scene is covered in the depth map 330, where the lighter greyscale indicates foreground and the darker greyscale indicates background.
However, for many common photographic applications, a small depth of field is desirable, for example to perceptually separate the subject from the background in a portrait photo. If the images have a small depth of field, existing depth from defocus methods will only be able to estimate depths that are close to the plane of best focus. Objects that are further away from the plane of best focus will be assigned incorrect depth. For instance, in FIG. 4, an image pair 410, 420, both images having a small DoF, is used to produce a DFD depth map 430 in which higher intensity represents smaller or shallower depth. The scene in this example has a mannequin 440 in the foreground around 2 m from the camera and a teddy bear 450 slightly behind the mannequin at about 2.5 m against a grassy backdrop 460 at about 3.8 m. The best focus is around 2.1 m from the camera. The depth map 430 shows that, while the mostly in-focus mannequin is assigned consistent depth values, the teddy bear and the grassy backdrop are assigned inconsistent and conflicting depth values. The grassy backdrop, in particular, is often assigned depth values which suggest that it is closer to the camera than the teddy bear. To cover the entire depth, a number or set of images (more than 2) with small DoF can be taken with different focus distance to cover different depth ranges. The DFD depth estimates from each pair of images in the set may then be combined to obtain a full depth map. However, this requires a longer capture time, and any camera or object motion, as well as changes in lighting conditions during capture, will lead to higher depth estimation errors.
DFD methods typically divide the input images into tiles for depth map processing. The size of the tiles used in the depth from defocus method also affects the accuracy of the depth estimates. The larger the size of the tiles, the less noisy the depth estimates over regions of similar depth. However, the spatial resolution at depth boundaries is reduced. Along depth boundaries, the depth from defocus method assumption of constant depth (over a tile) is often violated and the depth estimates are inaccurate. Depth from defocus methods also generate very noisy or no depth estimates in regions with little texture. As a result, DFD depth maps often need to be refined to reduce noise in depth estimates and align depth boundaries with object edges. Even after such refinement, the depths can remain inaccurate due to poor initial depth estimates.