In many applications of image capture, it can be advantageous to determine the distance from an image capture device to objects within a field of view of the image capture device. A collection of such distances to objects in an imaged scene is sometimes referred to as a “depth map”. A depth map of an imaged scene may be represented as an image, which may be of a different pixel resolution to the image of the scene itself. The distance to objects corresponding to each pixel of the depth map is represented by a greyscale or colour value.
A depth map can be useful in the fields of photography and video image capture, as a depth map enables several desirable post-capture image processing capabilities. For example, a depth map can be used to segment foreground and background objects to allow manual post-processing, or automated application of creative visual effects. A depth map can also be used to apply depth-related visual effects such as simulating aesthetically pleasing graduated blur of a high-quality lens using a smaller and less expensive lens.
In the fields of professional photography and video image capture, visual effects need to be high quality. A significant issue when using foreground segmentation masks for visual effects is the accuracy of segmentation around “fine structure” at the edge of a subject, such as hair around the face of the subject. Errors in fine structure segmentation can cause artefacts which are highly visible in a processed image.
Depth estimation may be performed by “depth from defocus (DFD)” using a single camera by capturing two images with different focus or aperture settings and analysing relative blur between the two images. Depth from defocus (DFD) uses a single standard camera without special hardware modifications. The same single standard camera can be used for image or video capture and also for depth capture.
A first conventional method for depth from defocus (DFD) involves estimating relative blur by dividing the spatial frequency spectrum of regions in a first image by a spectrum of regions in a second image, creating a spectral ratio. Such a method approximately cancels out scene spectrum and allows a change in optical transfer function between the first and second images to be estimated. However, this first conventional depth from defocus (DFD) method requires Fourier transforming a square region of an image and creates a single depth estimate for that square region. A resulting depth map has lower resolution than the first and second images and fine structure is not resolved.
Another conventional method for depth from defocus (DFD) involves convolving a series of relative blur kernels with one image and subtracting resulting images from a second captured image to create a series of blur difference images. A minimum blur difference value across the blur difference images for each region is used to find a best estimate of relative blur for that region. However, the blur difference images generated by this second conventional method are extremely noisy.
A first method of compensating for noise in the blur difference images described above is to average the blur difference images over square regions before finding a minimum blur difference value. This square region averaging creates a low resolution depth map in which fine structure is not resolved.
Another method of compensating for blur difference noise is to apply an error minimisation method. In this error minimisation method, a data error term is created using the blur difference. A total variation error term based on depth map gradient magnitude is used to penalise depth maps that are not piecewise smooth. A neighbourhood regularisation term is used to apply a non-local means filter based on the assumption that pixels with similar colours are likely to be similar depths. The non-local means filter uses an elliptical Gaussian window to allow the tracing of fine structure. However, the error minimisation method requires a complex algorithm which takes a long time to process. Further, the error minimisation method uses colour to identify fine structure, which may fail if there are similar colours in the background behind the fine structure. The elliptical window has a finite width along the minor axis, which may limit the resolution of fine structure.
Another method of compensating for the blur difference noise discussed above is to iteratively apply both spectral ratio and blur difference to estimate relative blur, until the relative blur converges to a stable result. However, the use of the spectral ratio means that a resulting depth map is low resolution and fine structure will not be resolved.