A large imaging device, like a digital single-lens reflex (DSLR) camera, can generate an image that exhibits a focused foreground and a blurred background. This is because such devices have large apertures, which enable control over the depth of field (DOField) in an image. For example, a shallow depth of field (sDOField) effect can often be used by a DSLR camera to generate an image that includes a sharply focused foreground object (e.g., a person) and a blurred background object (e.g., a scenery behind the person, etc.). In contrast, smaller imaging devices (e.g., a mobile device camera, a table computer camera, a webcam, etc.) have smaller apertures and shorter focal lengths than large imaging devices, and as a result, are unable to generate images that exhibit a focused foreground and a blurred background without additional processing. This is because the smaller apertures and shorter focal lengths in these smaller imaging devices fail to provide the same level of control over DOField as the control found in larger imaging devices. Typically, additional processing is performed on images captured by smaller imaging devices to replicate the effects provided by larger imaging devices (e.g., the sDOField effect, etc.).
Replicating the effects provided by larger imaging devices on images captured by smaller imaging devices typically requires distinguishing one or more foreground objects in a digital image representing a scene (e.g., a person, etc.) from the background in the digital image representing the scene (e.g., a scenery behind the person, etc.). This separation enables one or more processing operations to be applied to the foreground and the background separately to achieve a desired visual effect in an image (e.g., to achieve an sDOField effect, etc.).
One conventional approach to synthesizing effects (e.g., an sDOField effect) on a digital image captured by a smaller imaging device is as follows: (i) generate a conventional depth map for a digital image representing a scene; and (ii) artificially add extra blur to the background in the digital image representing the scene based on the depth map. Generating depth maps generally requires a focus sweep. As used herein, a “focus sweep,” a “focal sweep,” a “focal stack of images,” and their variations refer to a group of multiple images representing a scene, each of which correspond to a different focus position. That is, each image in the group is captured at a different focus position from all other images in the group. A focus sweep is generally performed sequentially, with the images in the stack being captured over a finite time period.
A conventional focus sweep typically requires at least half a dozen images to cover a smaller imaging device's working distance. This requirement generally translates to a need for a large amount of computational resources (e.g., large memory requirement, increased processing capability, a longer capture time, etc.), which can affect the functioning of a small-format imaging device by reducing the processing power available for other tasks. In order to reduce the number of images in the focus sweep, it may be necessary to estimate the optimal focus positions for a given scene before capture. Unfortunately, this estimation can require additional images to be collected before the estimation can be performed, which may not be feasible. Moreover, due to the inaccuracies in both the estimation and the lens movement, the focus positions at which the images are actually captured may not be ideal, resulting in unintended blurring of the foreground in an image.