Due to optical limitation of ordinary cameras, it is often impossible to capture the entire dynamic range of a real scene in a single image when there is large irradiance variation in the real scene. In an attempt to cover the entire dynamic range of the real scene, one way is to capture a set of differently exposed images and synthesize these images into a high dynamic range (HDR) image [cf. 1, 2], in which a pixel is represented by three 32-bit float point numbers. Although this HDR image is generally able to reproduce the real scene and is theoretically perceived by human eyes, the HDR image cannot be directly displayed or printed out because of low dynamic ranges of monitors and printers. Therefore, to visualize the HDR image, many tone mapping algorithms have been proposed to convert HDR image into low dynamic range (LDR) image [cf. 3, 4, 5], but these solutions are not so ideal for mobile devices in which there are obvious limitations.
Recently, a new image fusion technology, called exposure fusion, was proposed to overcome this problem. The input of exposure fusion is a set of differently exposed images. The output is an LDR image rather than an HDR image as in the above solution. The challenge of exposure fusion is how to seamlessly merge the information of input images together. As objects in the input images have obvious intensity gap, it is necessary for the image fusion algorithms to find some way to make the objects' intensity changes smoothly in the output image. Several methods were proposed to address this problem. In Mertens et al's method [cf. 6], all input images were scaled into several down-sampled layers by using the Laplacian pyramid [cf. 7]. A weighting factor that was calculated by taking the luminance, contrast and color information of a pixel into consideration was introduced to blend each layer of the input images. Although this method can provide visually pleasing result, the output image often lacks of detail information, as the smoothing effect of the Laplacian pyramid results in loss of details. In Goshtasby's method [cf. 8], all input images are firstly divided into blocks. Among the blocks at the same spatial location, the block with maximum entropy was selected to build the output image. A spatial Gaussian filter was then applied to remove the seam between neighboring blocks. Clearly, if a block has two objects that are with different intensity, the small object will be sacrificed. In addition, an abrupt intensity change at block boundary of neighboring block in merged image of input images with different content could be visually annoying and object covering these blocks would leave artificial variation in luminance.