In many applications of image capture, the distance from an image capture device (e.g., a digital camera, a video camera, a camera phone, a laptop computer, a tablet computer) to objects within the field of view of the image capture device can be advantageously determined. A collection of such distances to objects in an imaged scene is sometimes referred to as a depth map. A depth map of an imaged scene may be represented as an image, which may be of a different pixel resolution to the image of the scene itself, in which the distance to objects corresponding to each pixel of the depth map is represented by a greyscale or colour value.
A depth map can be useful in a number of applications including photography and video capture, as the depth map enables several desirable post-capture image processing capabilities for photographs or video streams. For example, a depth map can be used to segment foreground and background objects in a digital image to allow manual post-processing, or the automated application of creative photographic or video special effects.
Several features are desirable in any method of acquiring a depth map. Depth accuracy is important; otherwise, the resulting depth map may suggest that objects are at distances significantly different to their true distances. Depth resolution is important to allow the separation of objects that may be spatially close to one another in the scene and also to allow for accurate post-processing operations. Spatial resolution of the depth map is also important in many applications and, in particular, depth maps approaching the resolution of the images themselves are useful for pixel-wise segmentation and avoiding visually obvious object boundary errors in many post-processing operations. A tolerance to subject or camera motion is highly desirable, especially in video applications where the subjects and the camera are likely to be moving during image capture. Desirably, depth mapping methods can be realised in practical devices, such as cameras, with minimal additional cost, bulk, weight, image capture and processing time, and power consumption. Several methods are known for determining a depth map from images of a scene. So-called active depth mapping methods involve projecting beams or patterns of light or other radiation onto a scene. Active methods require projection optics, which add significant cost, weight, and power requirements. In addition, active methods have limited range and may add unwanted light to a scene in which the lighting must be carefully controlled for artistic effect.
So-called passive depth mapping methods, in contrast to active methods, rely only on ambient light in a scene. One method, stereo imaging, uses multiple cameras to determine depth using the stereoscopic effect. Stereo imaging has disadvantages related to multiple viewpoints, equipment cost, difficulty of alignment, and object occlusion. Another method, depth from focus, uses multiple shots from a single camera at many different focus positions. Depth from focus has the significant disadvantage of requiring a relatively long scan through focus, making the depth from focus method impractical for video frame rates or scenes containing moving objects. Another method, depth from defocus (DFD), uses a small number of images shot at different focus positions and extracts depth information from variation in blur with object distance.
Several different depth from defocus (DFD) methods are known. Such depth from defocus (DFD) methods typically rely on correspondences between regions of pixels in multiple images of the same scene to extract depth information about the object imaged at that image region. The depth information is extracted by quantifying the amount of blur difference between the images of an object. For a static object and camera, the blur difference is caused by a change in the focal quality of the image captures, which is governed by a change in parameters of the camera, such as focus, aperture, or zoom. However, if the depth of an object changes between the image captures, due to the object or the camera or both moving axially, then an additional change in focal quality and hence blur is caused by the axial motion. If standard depth from defocus (DFD) methods are applied, the additional change in blur causes the methods to give an incorrect depth estimate to the object. Furthermore, the additional change in blur cannot be disambiguated from the change in blur caused by the change in parameters, so standard depth from defocus (DFD) methods cannot compensate for the additional change in blur caused by axial motion. The parameters including focus, aperture, or zoom, may be referred to as “camera parameters” or “image capture device parameters”.
Some existing depth mapping methods can measure depths of axially moving objects. However, the depth mapping methods that measure depths of axially moving objects disadvantageously require the additional cost and size of either active projection or stereo imaging, or camera modifications such as coded apertures which interfere with standard imaging.
Some existing depth mapping methods make use of a moving camera to measure the depth of an object, using techniques such as axial parallax or joint solutions for blur and apparent affine motion. Such methods solve for depth under the condition of relative axial motion between the camera and object. However, the depth mapping methods that make use of a moving camera to measure the depth of an object require knowledge of the amount of motion of the camera relative to a stationary object, and cannot determine the depth of an object which moves by an unknown amount.
Some existing depth mapping methods attempt to treat axial motion and the apparent change in size of an object in different images by warping the images to compensate for the change in size of the object. However, the depth mapping methods that warp the images to compensate for the change in size of the object address only the change in size of an axially moving object, and not the change in focal blur caused by the motion. If applied to axially moving objects, the depth mapping methods that warp the images are biased by the axial motion and produce an incorrect depth estimate of the moving object.