Digital cameras, including digital single-lens reflex (DSLR) cameras and digital cameras integrated into mobile devices, often have sophisticated hardware and software that enables a user to capture digital images using a combination of different user-defined and camera-defined configuration settings. A digital image provides a digital representation of a particular scene. A digital image may subsequently be processed, by itself or in combination with other images of the scene, to derive additional information from the image. For example, one or more images of a scene may be processed to estimate the depths of the objects depicted within the scene, i.e., the distance of each object from a location from which the images were taken. The depth estimates for each object in a scene, or possibly each pixel within an image, are included in a file referred to as a “depth map.” Among other things, depth maps may be used to improve existing image editing techniques (e.g., cutting, hole filling, copy to layers of an image, etc.).
Conventionally, depth maps are generated using one of a variety of techniques. Such techniques include depth from defocus techniques, which use out-of-focus blur to estimate depth of the imaged scene. Depth estimation using such techniques is possible because imaged scene locations will have different amounts of out-of-focus blur (i.e., depth information) depending on the camera configuration settings (e.g., aperture setting and focus setting) used to take the image(s). Estimating depth, therefore, involves estimating the amount of depth information at the different scene locations, whether the depth information is derived from one image or from multiple images of the scene. Conventionally, the accuracy of such depth estimates depends on the number of images used. Generally speaking, the greater the number of images that are input, the greater the amount of depth information that can be compared for any one position (e.g., pixel) in the scene.
Thus, many conventional depth from defocus techniques may require a dense set of input images in order to estimate scene depth with a higher degree of certainty. However, conventional techniques cannot predictively determine the optimal number of images and the corresponding camera configuration settings needed for estimating scene depth map with any particular degree of certainty. Nor can conventional techniques be used to analyze an existing depth map to predictively determine a number of additional images that could be captured of the scene with particular camera configuration settings, so that sufficiently more depth information would be available to refine the existing depth map (i.e., improve the accuracy of its depth estimates).
Accordingly, it is desirable to provide improved solutions for analyzing an existing depth map or other scene depth information to predictively determine a number of additional images to be captured of the scene, and the camera configuration settings used for capturing them, such that sufficient depth information is available for refining the depth estimates provided by the existing depth map or other scene depth information.