In many applications of image capture, the distance from an image capture device to objects within the field of view of the image capture device can be advantageously determined A collection of such distances to objects in an imaged scene is sometimes referred to as a depth map. A depth map of an imaged scene may be represented as an image, which may be of a different pixel resolution to the image of the scene itself, in which the distance to objects corresponding to each pixel of the depth map is represented by a greyscale or colour value.
A depth map can be useful in a number of applications including photography and video capture, as the depth map enables several desirable post-capture image processing capabilities for photographs or video streams. For example, a depth map can be used to segment foreground and background objects in a digital image to allow manual post-processing, or the automated application of creative photographic or video special effects.
Several features are desirable in any method of acquiring a depth map. Depth accuracy is important; otherwise, the resulting depth map may suggest that objects are at distances significantly different to their true distances. Depth resolution is important to allow the separation of objects that may be spatially close to one another in the scene and also to allow for accurate post-processing operations. Spatial resolution of the depth map is also important in many applications, in particular, depth maps approaching the resolution of the images themselves are useful for pixel-wise segmentation and avoiding visually obvious object boundary errors in many post-processing operations. A tolerance to subject or camera motion is highly desirable, especially in video applications where the subjects and the camera are likely to be moving during image capture. Desirably, depth mapping methods can be realised in practical devices, such as cameras, with minimal additional cost, bulk, weight, image capture and processing time, and power consumption.
Several methods are known for determining a depth map from images of a scene. So-called active depth mapping methods involve projecting beams or patterns of light or other radiation onto a scene. Active methods require projection optics, which add significant cost, weight, and power requirements. In addition, active methods have limited range and may add unwanted light to a scene in which the lighting must be carefully controlled for artistic effect.
So-called passive depth mapping methods, in contrast to active methods, rely only on the ambient light in the scene. One method, stereo imaging, uses multiple cameras to determine depth using the stereoscopic effect; this has disadvantages related to multiple viewpoints, equipment cost, difficulty of alignment, and object occlusion. Another method, depth from focus, uses multiple shots from a single camera at many different focus positions; this has the significant disadvantage of requiring a relatively long scan through focus, making this method impractical for video frame rates. Another method, depth from defocus (DFD), uses a small number of images shot at different focus positions and extracts depth information from variation in blur with object distance. Depth from defocus is more practical than other methods for many applications because DFD relies on as few as two images to determine depth.
Several different DFD methods are known. Such DFD methods typically rely on correspondences between regions of pixels in multiple images of the same scene to extract depth information about the object imaged at that image region. These correspondences are interfered with by object motion in two ways: misalignment of an object because the object has moved in between the exposures, and motion blur caused by an object moving with a lateral translational motion during an exposure. Misalignment may be dealt with, more or less successfully, using image alignment methods. Motion blur, however, is a more difficult problem and has only been addressed by a small number of methods.
One method of dealing with motion blur in DFD is to detect regions of the images which show significant motion blur, and discard those detected regions from further consideration. The resulting depth map of the scene may then contain regions where no depth information is provided, which presents a significant disadvantage to any application of the depth data.
Another method of dealing with motion blur in DFD is to develop a model of image formation involving parameters related to defocus blur and motion blur, described by a system of equations. The model may include regularisation assumptions and a cost function to make the problem numerically tractable and be solved in an iterative error minimisation manner to produce depth estimates. Disadvantageously, such a method is computationally expensive and inappropriate for implementation on a portable device or for rapid post-processing workflow.