In many applications of image capture, it can be advantageous to determine the distance from the image capture device to objects within the field of view of the image capture device. A collection of such distances to objects in an imaged scene may be referred to as a depth map. A depth map of an imaged scene may be represented as an image, which may be of a different pixel resolution to the image of the scene itself. In the depth map, the distance to objects corresponding to each pixel of the depth map is represented by a greyscale or colour value.
A depth map can be useful in the fields of photography and video, as a depth map enables several desirable post-capture image processing capabilities. For example, a depth map can be used to segment foreground and background objects to allow manual post-processing, or the automated application of creative visual effects. A depth map can also be used to apply depth-related visual effects such as simulating aesthetically pleasing graduated blur of a high-quality lens using a smaller and less expensive lens.
Depth estimation may be performed by depth from defocus (DFD) using a single camera by capturing two or more images with different focus or aperture settings and analysing relative blur between corresponding tiles of images. Depth from defocus is a flexible method because the depth from defocus method uses a single standard camera without special hardware modifications. The same camera can be used for image or video capture and also for depth capture.
The size of the tiles used in the depth from defocus method affects the depth estimates. The larger the size of the tiles, the less noisy the depth estimates over regions of similar depth. On the other hand, the spatial resolution at depth boundaries is reduced. Along depth boundaries, the depth from defocus method assumes a constant depth (over a tile) is also violated and the depth estimates are inaccurate. The depth from defocus methods also generate very noisy or no depth estimates in regions with little texture. As a result, depth from defocus depth maps often need to be refined to reduce noise in depth estimates and align depth boundaries with object edges.
A joint bilateral filter (JBF) has been used for up-sampling low resolution data including depth maps given an associated high resolution image. Using the high resolution image as a prior, a joint bilateral filter smooths out data while preserving discontinuities in the data that coincide with the edges in the image. When filtering a depth map, the depth at each pixel of the image is replaced by a weighted average of the depth values of the pixels in a local window of the pixel. The weights depend on both the spatial distance (a function of pixel location, the domain variable) and the difference in intensity or colour (the range variable) between the pixels—hence, the name “bilateral”.
Local pixels that are closer to a current pixel and whose intensity or colours are closer to a pixel are given more weights when estimating the depth of the pixel. However, depth maps are typically noisy with misaligned depth and object boundaries. Hence, local pixels that are similar in intensity or colour might not have the correct depth especially for pixels that are close to depth boundaries. While either image intensity or colour can be used as a range variable in the formulation of a joint bilateral filter and extensions of the joint bilateral filter, present specification will only refer to the use of the colour range variable to simplify the description.
A joint bilateral filter has been extended by adding a range filter on depth (in addition to the location weight and colour weight of the joint bilateral filter) so that weights of local pixels also depend on depth values. The extension of the joint bilateral filter results in lower weights to local pixels that have a different depth value to that of a current pixel even if intensity and colour of the local pixels are similar to the intensity and colour of the current pixel. The extension of the joint bilateral filter should help to preserve depth discontinuities over regions of an image where different depth layers with similar intensity and colour met. Unfortunately, since depth maps are typically noisy with misaligned depth and object boundaries, the additional range filter on depth tends to amplify noise and exacerbate depth/object boundaries misalignment. As a result, a better method of utilising depth data is required.
In addition, since the depth from defocus method relies on relative blur between two or more images for estimating depth, object and camera motion in between the capture of the images may result in occluded regions that appeared in only one of the images and produced no depth estimate. For the occluded regions, the missing depth estimates have to be first interpolated from the available depth estimates surrounding the occluded regions before a joint bilateral filter can be applied to refine the depth map.