When taking a photograph, only part of the scene will be in-focus with objects closer to or further away from the camera appearing blurry in the captured image. The degree of blurring increases with increasing distance from the in-focus region. The distance between the closest and most distant objects in a scene that appear acceptably sharp is known as the depth of field (DOF). In some photographic situations, such as for portraits, it is highly desirable to have a narrow DOF as the resulting bluffed background removes distractions and directs the viewer's attention to the in-focus subject.
Images captured by SLR (single-lens reflex) cameras generally have the desirable characteristic of a narrow DOF. The subject of focus possesses maximum sharpness whereas the background, usually at a different depth to the camera, appears blurred. The visually pleasing background blur, also known as bokeh, is only achievable optically under large apertures. As a result, images from low-cost compact cameras often appear sharp everywhere. This extended depth of field effect typically reduces the perceived quality of images captured with a compact camera, when compared with a SLR camera.
Higher quality cameras, such as SLR models, are generally able to capture images with a narrow depth of field due to the large size of the lens and sensor. The large lens advantageously enables a correspondingly large aperture to be selected to create the desired depth of field effect. Notwithstanding, due to factors such as reduced cost and size, compact cameras are more popular than SLR cameras. Disadvantageously, though, photographs taken with compact cameras inherently have a greater DOF than images taken using an SLR camera with the same field of view and relative aperture due to optical constraints. One approach to producing SLR-like images in a compact camera is to post-process the captured image in conjunction with a depth map of the scene to reduce the apparent DOF. This is termed bokeh rendering or depth-of-field rendering. The depth map is used to selectively blur background pixels, leaving substantially only the subject in-focus.
Numerous techniques have been developed to obtain a depth map of a scene. These are classified as active (projecting light or other energy on to the scene) or passive (relying only on ambient light). Active methods, such as structured light, are often expensive, require additional power, may be intrusive and do not function outside a specified range of depth. Passive depth mapping methods, such as depth from defocus or depth from stereo, are less expensive but are also less accurate for natural scenes. For example, passive mapping techniques are generally unable to obtain depth information for surfaces with low texture contrast, which leads to errors in the depth map. Fortunately, these errors can be reduced by analysing the original image and enforcing correspondence between object edges and depth boundaries to produce a refined depth map. Human annotation is sometimes used to manually segment an image or to correct for segmentation errors from computational depth mapping techniques. In computer generated graphics, an accurate depth map is available together with three-dimensional (3D) scene information.
There are numerous known methods for bokeh rendering. Many of the earlier such methods originated from the field of computer graphics, where full 3D information of the scene is available. Rendering from multiple pin-hole cameras, for example, approximates the effect of imaging with a finite aperture lens by placing the pin-hole cameras at different locations throughout the lens. Ray tracing renders depth of field by casting multiple rays across the lens. Both of these methods provide realistic rendering if the number of pin-hole cameras or light rays is large enough to properly sample the point spread function of the most blurred pixels. However, due to this large number of pin-hole cameras or light rays required, these methods are not generally suitable for real-time operation, even when practised in a parallel processing environment.
When concerned with real-time operation, post-processing an image captured by or rendered from a pinhole camera is a preferred approach over ray tracing. In this approach each pixel is blurred by a different amount derived from its depth. There are two ways to achieve this space-variant blur: spreading and gathering. In spreading, the point spread function of each input pixel is splatted onto the output image. In gathering, the intensity of each output pixel is a linear combination of the intensities of the surrounding input pixels. The weights of this linear combination form a filter kernel centred at the output pixel. Both spreading and gathering can make use of depth ordering (e.g., using a z-buffer) to avoid blurry background pixels affecting sharp pixels in front of them or vice versa. Although spreading is a better approximation of the image formation process, it is more expensive to implement than gathering. Specifically, spreading needs substantially more write access than gathering. Spreading writes to all neighbouring pixels of each input pixel it visits. Gathering, on the other hand, needs only one write access per output pixel.
Gathering can be seen as a convolution of the input image with a filter kernel. The size and shape of the filter directly correspond to the amount of blur applied to the input image at any given point. This type of filtering has a Finite Impulse Response (FIR), which is a sampled version of the filter itself. The computational complexity of FIR filters is therefore proportional to the size of the filter. While suitable for small filters, FIR filtering is often too costly for large filters. Large FIR filters can be implemented more efficiently as a product in the Fourier domain, however this creates a fixed amount of blur for each Fourier operation and is therefore unsuitable for creating variable blur.
Fortunately, there is a more efficient way of image filtering whose complexity is independent of the filter size. Infinite Impulse Response (IIR) filters achieve this goal using a fixed number of filtering taps by feeding previously filtered output pixels to the filtering of the current pixel. Traditional IIR filters can be used to synthesize bokeh by changing the filter parameters according to the varying blur width at each pixel. However, this naïve extension of traditional IIR filters to handle variable blur width often results in bleeding artefacts across sharp depth discontinuities.
Instead of outputting a blurred pixel value in one step using a filter kernel, the output pixel can be iteratively blurred through a diffusion process. Analogous to heat diffusion over a conducting medium, diffusion can be applied to an image to achieve a Gaussian-like blur. At each iteration, the image I is blurred by a small incremental amount by adding a fraction of its
      Laplacian    ⁢                  ⁢    Δ    ⁢                  ⁢    I    =                              ∂          2                ⁢        I                    ∂                  x          2                      +                                        ∂            2                    ⁢          I                          ∂                      y            2                              .      The fraction, also known as the step size, is typically small (e.g., less than 0.25) for the diffusion process to be stable. The final output image has a desired amount of blur after a certain number of iterations. To achieve space-variant blur, the step size can be varied locally across the image in similar fashion to varying the heat conductivity of the medium. In the in-focus midground, for example, the conductivity or diffusivity should be close to zero to incur minimal blur. In the blurred background and blurred foreground regions, the diffusivity should be set to a high value (not too high to avoid instability, i.e., less than 1) for maximum diffusion per iteration. Diffusion methods do not produce bleeding artefacts if setup correctly because the blurring is stopped at depth discontinuities. However, diffusion is not suitable for hardware implementation due to its iterative traverse over the image data. Methods that aim to solve the edge-stopping diffusion equation directly result in a large linear system involving all input pixels and all output pixels. Although this sparse linear system can be solved recursively using LU decomposition, the computational cost is still high and graphical processing units have to be employed to achieve interactive speed (i.e., close to real-time).
Another approach to bokeh rendering that is gaining popularity recently is a layer compositing approach. In this approach, the image is segmented into different layers based on depth. Each layer gets blurred separately and the blurred results are then combined to form the final output. The combination usually involves some form of alpha-matting for a smooth transition across layers. A straightforward way of segmenting the image into different layers is by depth discretisation. Each layer contains pixels from a narrow range of depth and therefore can be blurred by a single filter kernel or more efficiently in the Fourier domain. Depth discretisation, however, results in discretisation artefacts at layer boundaries. This type of artefact is most visible when an object of extended depth range is split into different layers. A better way to segment the image is to split the pixels into either two layers (sharp foreground against blurred background) or three layers (blurred foreground, sharp midground, and blurred background). Separation of the blurred foreground layer from the sharp midground layer makes it easier to simulate a partial occlusion effect. The blurred background behind a thin midground object also appears more natural with this layering approach. However, due to the coarse depth segmentation, each layer may still need to be blurred by a space-variant filter.
None of the above-mentioned methods offer a combination of real-time and high-quality bokeh rendering without artefacts. Hence, there is a need for an improved bokeh rendering method and system that is efficient and hardware-friendly and that can simulate realistic blur given a continuous space-variant depth map.