Digital cameras, including digital single-lens reflex (DSLR) cameras and digital cameras integrated into mobile devices, often have sophisticated hardware and software that enables a user to capture digital images using a combination of different user-defined and camera-defined configuration settings. A digital image provides a digital representation of a particular scene. A digital image may subsequently be processed, by itself or in combination with other images, to derive additional information from the image. For example, one or more images may be processed to estimate the depths of the objects depicted within the scene, i.e., the distance of each object from a location from which the picture was taken. The depth estimates for each object in a scene, or possibly each pixel within an image, are included in a file referred to as a “depth map.” Among other things, depth maps may be used to improve existing image editing techniques (e.g., cutting, hole filling, copy to layers of an image, etc.).
Conventional depth estimation techniques involve computational models that rely on “image priors” to guide depth map generation. An “image prior” is a statistical model used to account for certain assumptions about the content or characteristics of a scene and is used to resolve depth ambiguity that may be encountered when analyzing an image. For example, an image prior may be designed based on an assumption that depth varies smoothly across an all-in-focus an image, except where there is depth discontinuity (e.g., at the edge or outline of an object within the scene). In the image domain this type of image prior may be expressed in terms of an expected distribution of gradients across an image. In the Fourier domain, a comparable image prior may be defined such that the amount of energy at a particular frequency is proportional to that frequency raised to some power. The use of this type of image prior in depth estimation may yield good results in some cases, but will fail to do so when depth discontinuities between foreground and background objects are not captured within the image data. Therefore the use of generic image priors in depth map generation leads to imprecise depth estimates given that the underlying assumptions are based on generalizations and necessarily do not hold true for all images (e.g. the actual texture of all scenes will not fit well with the defined image prior).
Some conventional techniques for estimating depth within a digital image also require the input of training data taken from one or more images of different scenes in order to accurately generate a depth map. Some techniques require that the image for Which the depth map is to be generated must be captured with a pre-determined aperture setting and a pre-determined focus setting. Other techniques involve a comparison of the characteristics of multiple images taken of the same scene and may require the multiple images to be captured in a particular order or in accordance with predefined combinations of aperture, focus and/or other camera configuration settings. Some models also require a dense set of images to be captured in accordance with a predefined combination of aperture, focus and/or other configuration settings or with a randomly selected combination of aperture, focus and/or other configuration settings.
Conventional depth estimation techniques that compare characteristics of multiple images taken of the same scene generally compare image patches with “blur kernels” to estimate the depth of the image patches. A blur kernel is an approximation of out-of-focus blur within an image. Convention models, however, are often biased towards selecting blur kernels with larger blur values, especially in the presence of noise within the image patch (i.e., information that is not accounted for by the model). Selecting blur kernels that do not closely approximate the blur of the image patch will lead to imprecise depth estimates.
Therefore, current depth estimation techniques can be imprecise, resource-intensive, time-consuming, complicated and unpredictable. Accordingly, it is desirable to provide improved solutions for estimating the distance of objects within images taken with conventional digital cameras.