When taking a photograph, only part of the scene will be in-focus with objects closer to or further away from the camera appearing blurry in the captured image. The degree of blurring grows larger with increasing distance from the in-focus region. The distance between the closest and most distant objects in a scene which appear acceptably sharp is known as the depth of field (DOF). In some photographic situations, such as portraits, it is highly desirable to have a narrow DOF as the resulting blurred background removes distractions and draws the viewer's attention to the in-focus subject.
Cameras such as single lens reflex (SLR) models are able to capture images with a narrow depth of field due to the large size of the sensor and lens. The large lens allows a corresponding large aperture to be selected to create the desired depth of field effect. Due to factors such as reduced cost and size, compact cameras are more popular than SLR cameras. However, photographs taken with compact cameras inherently have a greater DOF than images taken using an SLR camera with the same field of view and relative aperture due to optical constraints. One approach to producing SLR-like images in a compact camera is to post-process the captured image in conjunction with a depth map of the scene to reduce the apparent DOF. This is termed bokeh rendering. The depth map is used to selectively blur background pixels, leaving the subject in-focus.
In the simplest case, the captured image may be segmented into foreground and background depth layers using the depth map. The background layer is blurred and combined with the foreground using a compositing technique which makes use of an opacity matte, also known as an alpha matte. The alpha matte defines the relative contribution of each pixel in the final image from each of the two image layers. The compositing of two images for bokeh rendering may be described by the following equation:C(x,y)=α(x,y)F(x,y)+(1−α(x,y))B(x,y),  (1)Where x and y are pixel coordinates within the image, F is the three dimensional colour vector at each pixel location in the foreground image layer, B is the three dimensional colour vector at each pixel location in the blurred background, α is the opacity at each pixel location and C is the three dimensional colour vector at each pixel location in the resulting bokeh image. At pixel locations with an opacity value of 1, the final image comprises only foreground pixel values. At pixel locations with an opacity value of 0, the final image comprises only background pixel values. At pixel locations with fractional opacity values the final image comprises a mixture of foreground and background pixel values. The aesthetic quality of the resulting bokeh image depends crucially on the accuracy of the alpha matte. Errors of the order of a few pixels can result in unacceptable images.
Many techniques have been developed to obtain a depth map of a scene. These are classified as active (projecting light or other energy on to the scene) or passive (relying only on ambient light). A passive depth mapping method is often preferred because active methods, such as structured light, are expensive, require additional power, and may be intrusive or not function for outside scenes. However, passive mapping techniques are not able to obtain depth information for surfaces with low texture contrast, which leads to errors in the depth map. These errors may be reduced by analysing the original image to enforce correspondence between object edges and depth boundaries to produce a refined depth map. However in practice, significant errors remain in the refined depth map.
An in-focus region incorrectly blurred or a background region incorrectly rendered sharply because of an error in the depth map is easily noticeable as an obvious artefact.
Software applications have been developed for use on personal computers which allow a user to manually segment an image and composite the selected region over a new background. The width of the band of fractional alpha values of the compositing matte along the segmentation boundary may be adjusted by determining a feather width. Typically the feather width is constant for the entire compositing matte. If a depth map is used for segmentation, the user is required to manually correct any significant segmentation errors before generating and refining the matte used to composite the final image. Correcting the appearance around depth map errors requires tedious and time-intensive manual adjustments by a skilled user.
Another technique for generating a compositing matte is known as alpha matte estimation in which a single image is modelled as being a mixture of foreground and background according to equation (1). The goal of alpha matte estimation is to estimate α, F and B given only C, which is a severely under-determined problem. The estimated values are used to composite the estimated image F over a new background using the compositing matte α. State-of-the-art methods generally assume the image has already been segmented into an accurate trimap which assigns image pixels to one of three categories; regions of definite foreground, regions of definite background and the remaining unknown regions which require alpha estimation. In practice, alpha matte estimation algorithms are computationally expensive and do not achieve sufficient accuracy for high quality rendering, even given a correct trimap (such as generated by user interaction). A trimap may be estimated from a depth map, but this makes the situation even worse. Errors present in the depth map are propagated to the trimap, undermining the assumption of trimap accuracy required by alpha estimation methods.
Generation of high quality artificial bokeh images using a compact camera requires an improved method of rendering which minimises the visibility of artefacts caused by depth map errors, without human interaction. Existing methods of automatically generating a compositing matte are unable to cope with depth map errors, resulting in distracting visual artefacts.