Image fusion is a process that combines two or more source images to form a single composite image with extended information content. Typically images from different sensors, such as infra-red and visible cameras, computer aided tomography (CAT) and magnetic resonance imaging (MRI) systems, are combined to form the composite image. Multiple images of a given scene taken with different types of sensors, such as visible and infra-red cameras, or images taken with a given type of sensor and scene but under different imaging condition, such as with different scene illumination or camera focus may be combined. Image fusion is successful to the extent that: (1) the composite image retains all useful information from the source images, (2) the composite image does not contain any artifacts generated by the fusion process, and (3) the composite image looks natural, so that it can be readily interpreted through normal visual perception by humans or machines. The term useful information as determined by the user of the composite image determines which features of the different source images are selected for inclusion in the composite image.
The most direct approach to fusion, known in the art, is to align the source images, then sum, or average, across images at each pixel position. This and other pixel-based approaches often field unsatisfactory results since individual source features appear in the composite with reduced contrast or appear jumbled as in a photographic double exposure.
Known pattern selective image fusion tries to overcome these deficiencies by identifying salient features in the source images and preserving these features in the composite at full contrast. Each source image is first decomposed into a set of primitive pattern elements. A set of pattern elements for the composite image is then assembled by selecting salient patterns from the primitive pattern elements of the source images. Finally, the composite image is constructed from its set of primitive pattern elements.
Burt in Multiresolution Image Processing And Analysis, V. 16, pages 20-51, 1981 (hereinafter "BURT") and Anderson et al in U.S. Pat. No. 4,692,806, incorporated herein by reference for its teachings on image decomposition techniques, have disclosed an image decomposition technique in which an original comparatively high-resolution image comprised of a first number of pixels is processed to derive a wide field-of-view, low resolution image comprised of second number of pixels smaller than the first given number. The process for decomposing the image to produce lower resolution images is typically performed using a plurality of low-pass filters of differing bandwidth having a Gaussian roll-off. U.S. Pat. No. 4,703,514, incorporated herein by reference, has disclosed a means for implementing the pyramid process for the analysis of images.
The Laplacian pyramid approach to image fusion is perhaps the best known pattern-selective method. BURT first disclosed the use of image fusion techniques based on the Laplacian pyramid for binocular fusion in human vision. U.S. Pat. No. 4,661,986 disclosed the use of the Laplacian technique for the construction of an image with an extended depth of field from a set of images taken with a fixed camera but with different focal settings. A. Toet in Machine Vision and Applications, V. 3, pages 1-11 (1990) has disclosed a modified Laplacian pyramid that has been used to combine visible and IR images for surveillance applications. More recently M. Pavel et al in Proceedings of the AIAA Conference on Computing in Aerospace, V. 8, Baltimore, October 1991 have disclosed a Laplacian pyramid for combining a camera image with graphically generated imagery as an aid to aircraft landing. Burt et al in ACM Trans. on Graphics, V. 2, pages 217-236 (1983) and in the Proceeding of SPIE, V. 575, pages 173-181 (1985) have developed related Laplacian pyramid techniques to merge images into mosaics for a variety of applications
In effect, a Laplacian transform is used to decompose each source image into regular arrays of Gaussian-like basis functions of many sizes. These patterns are sometimes referred to as basis functions of the pyramid transform, or as wavelets. The multiresolution pyramid of source images permits coarse features to be analyzed at low resolution and fine features to be analyzed at high resolution. Each sample value of a pyramid represents the amplitude associated with a corresponding basis function. In the Laplacian pyramid approach to fusion cited above, the combination process selects the most prominent of these patterns from the source images for inclusion in the fused image. The source pyramids are combined through selection on a sample by sample basis to form a composite pyramid. Current practice is to use a "choose max rule" in this selection; that is, at each sample location in the pyramid source image, the source image sample with the largest value is copied to become the corresponding sample in the composite pyramid. If at a given sample location if there are other source image samples that have ready the same value as the sample with the largest values, these may be averaged to obtain the corresponding sample of the composite pyramid. Finally, the composite image is recovered from the composite pyramid through an inverse Laplacian transform. By way of example, in the approach disclosed in U.S. Pat. No. 4,661,986, the respective source image samples with the largest value, which are copied at each pyramid level, correspond to samples of that one of the source images which is more in focus.
In the case of the Laplacian transform, the component patterns take the form of circularly symmetric Gaussian-like intensity functions. Component patterns of a given scale tend to have large amplitude where there are distinctive features in the image of about that scale. Most image patterns can be described as being made up of edge-like primitives. The edges in turn are represented within the pyramid by collections of component patterns.
While the Laplacian pyramid technique has been found to provide good results, sometimes visible artifacts are introduced into the composite image. These may occur, for example, along extended contours in the scene due to the fact that such higher level patterns are represented in the Laplacian pyramid rather indirectly. An intensity edge is represented in the Laplacian pyramid by Gaussian patterns at all scales with positive values on the lighter side of the edge, negative values on the darker, and zero at the location of the edge itself. If not all of these primitives survive the selection process, the contour is not completely rendered in the composite. An additional shortcoming is due to the fact that the Gaussian-like component patterns have non-zero mean values. Errors in the selection process lead to changes in the average image intensity within local regions of a scene. These artifacts are particularly noticeable when sequences of composite or fused images are displayed. The selection process is intrinsically binary, the basis function from one or the other source image is chosen. If the magnitude of the basis functions vary, for example because of noise in the image or sensor motion, the selection process may alternately select the basis functions from different source images. This leads to unduly perceptible artifacts such as flicker and crawlers.
Further, while the prior art may employ color in the derivation of the fused composite image itself, there is no way in the prior art of retaining the identity of those source images that contributed to particular displayed information in a fused composite image. For example, in a surveillance application, an observer may want to know if the source of a bright feature he sees in the composite image comes from an IR camera source image, so represents a hot object, or comes from a visible camera source, so represents a light colored, or intensely illuminated object.
Thus there is a need for improved methods of image fusion (in addition to the prior-art methods of either averaging or "choose max rule" selection, and the use of color) which overcome these shortcomings in the prior art and provide better image quality and/or saliency for the user in a composite image formed by the image fusion process, particularly when sequences of composite images are displayed.