1. Field of the Invention
This invention relates in general to signal processing and more specifically, to a system for fusing two or more images from different sensors into one image.
2. State of the Art
With the advance of image sensing technology, it has become increasingly desirable to provide efficient and cost effective ways to process and display image information. Today""s sensing devices often provide vast quantities of diverse information in differing forms and configurations that human operators often are unable to efficiently process visually. This situation, known as xe2x80x9cinformation overloadxe2x80x9d is often worsened when relevant information is provided simultaneously in different formats on multiple user interfaces, while the human operator often must focus his attention elsewhere.
For example, a pilot must process vast quantities of information from several different input devices while simultaneously piloting his aircraft, often under conditions which are less than favorable. For instance, a military pilot, tasked with flying a mission at low level in total darkness or inclement weather, must simultaneously evade hostile forces, acquire a target and accurately deliver an ordnance, while focusing his attention on piloting and navigating his aircraft. The pilot cannot divert attention from the task at hand for more than a few seconds to interpret multiple displays which provide speed, navigation, threat, targeting, weapons systems or other types of information which may be critical to the mission and pilot survival. Thus, relevant information may go unrealized with harmful and often catastrophic results.
Specifically, one CRT in the cockpit may display images produced by an optical sensor operating in the visual spectrum, while a second CRT in the cockpit may display images produced by a sensor sampling the same scene operating in the IR spectrum, and a third CRT may display images produced by radar returns from the identical scene. Thus, to effectively process the information from each input medium the pilot must divert his attention from the task of flying and navigating the aircraft for a significant period.
Similarly, a physician performing laser surgery or some other surgical procedure would need to be aware of the relationship his instrument has to the tissue or bone in close proximity to the area under repair. Cameras and CRT displays contained on the instruments themselves offer some insight into the area of interest, however they cannot show bones or tissue hidden from visual inspection. X-ray, IR and other sensor/detection means are used to provide that type of information, which need other display interfaces, causing the physician to shift her attention between multiple displays.
Scientist and engineers have taken several approaches to increase the speed and efficiency at which a human operator can receive and process image information from multiple sensors using multiple formats. One solution has been to use split screen displays. Split screen displays partition a CRT into sections, each section displaying the same scene imaged by a different type of sensor. For instance one section may display the scene imaged using an IR sensor, while the other section will display the same scene imaged using a camera operating in the visual spectrum, and yet another section will display a radar or x-ray image of the same scene. While more efficient than making an operator scan several CRTs distributed around him, this approach still requires an operator to focus his attention on each section of the CRT and methodically extract relevant information form each image format.
Another approach has been to employ multi-mode CRT displays. These displays normally have some type of display selection capability which allows a user to switch between different display modes, each mode displaying, on full screen, an image of a scene produced by a different sensor. For example, one mode may display scene images produced by a camera operating in the visible light spectrum, while another mode may display the same scene imaged by an IR sensor while yet another mode may display the same scene imaged by a radar or x-ray unit. This approach reduces the number of CRT displays necessary for displaying the image information, however it requires an operator to select the display mode and to focus attention on multiple displays modes to extract relevant information unique to each display mode (sensor).
Methods and systems are known for fusing image information from multiple sensors operating in different formats into a single composite image simultaneously displaying relevant information from each sensor.
However, known methods of fusing multiple images into a single composite image generally employ linear filter approaches followed by simply adding the images together pixel by pixel. Conventional linear filtering approaches create a new image by calculating a weighted average of the pixel intensities in the local area of the original image. In linear filtering his is referred to as a convolution operation. A small mask representing the xe2x80x9cweightsxe2x80x9d to be used in the average is moved across an intensity plot of the scene. Each pixel covered by the mask is multiplied by the appropriate weighting factor. The sum of all of the weighted pixels values becomes the new pixel value in the new image. FIG. 1, is an example of such an intensity plot 100, in which the image intensity is plotted across a selected horizontal line of an scene. The plot is across a selected horizontal line of a scene, thus the y coordinates are not apparent in FIG. 1. The peaks and valleys in the intensity plot are representative of the changes in intensity as the sensor samples a scene. A rapid change in intensity, such as shown by event 102, or event 108 suggests some change in the scene such as moving from background to an object or from object to background.
Generally, high frequency structure (that is, a pronounced change in the intensity over a short distance, or a small area of pixels), is associated with objects within a given image while low frequency structure (that is, a change covering a larger distance or area) is associated with the background.
Prior image fusion methods use conventional linear filtering to separate high frequencies, from the background by tracing the movement of a convolution mask (for example, element 104, represented as a window) as it slides over the intensity plot of the scene as shown in FIG. 1. Conventional linear filters remove high frequencies from the image scene to produce a new background or low frequency image by using a weighted average of the input image intensity 100 calculated over element 104 to produce the intensity plot as shown in FIG. 2. In this case element 104 is a convolution filter or convolution mask containing the weighs used in the local average.
The difficulty in using linear filtering techniques is that actual objects in a scene or image are composed of many different frequencies. Both large and small object contain high frequencies (e.g., the edges). Just removing high frequencies from an image will remove small objects or structure, but will also blur the edges of the large objects. Thus, modifying the frequency content of an image is a poor method for modifying the content of an image.
The use of a linear filter, allows the intensity of a small objects to affect the intensity values of local areas, causing residual xe2x80x9cbumpsxe2x80x9d 202 and 208 to be left at the location of a small object filtered from a scene as shown. This practice of employing an average can cause an undesirable blur 210 at places on the image where there is a pronounced change in intensity, such as a change in the background of the scene 110 or the onset of an object contained therein as shown by 102 and 108 of FIG. 1. Thus conventional linear filtering tends to blur the separation between objects and the background and is thus inefficient when tasked with extracting objects from imagery. As a result, the use of conventional linear filters in image fusion applications has been limited in using the local high frequency content of the imagery as a parameter in determining the intensity of the fused image at that location. Object identification is not attempted. This blurring effect common to linear filtering techniques also makes the system vulnerable to high frequency noise or fine grained structure patterns within the imagery. This effect also produces color stability problems when a scenes processed using linear filter techniques are displayed in color. The effect is magnified when the scene is changing or when in employed in a dynamic environment.
The present invention is directed to a structure or object oriented method and system for efficiently fusing image information from multiple sensors operating in different formats into a single composite image simultaneously displaying all of the pertinent information of the original images. The present invention finds objects and structure within the various images which meet a very general user defined size and shape criteria. It then inserts these objects into the fused image. The background of the fused images is obtained by the combination of the backgrounds of the input images after the objects have been removed. The various objects can be intensity or color coded based on their intensities in the source image from which each object came.
The present invention employs morphological filters or shape filters to process multiple signals produced by one or more imaging devices, separating the background image signals from the object image signals produced by each device on the basis of object size or structure orientation (independent of its intensity), thus eliminating the blurring normally associated with conventional linear image processing.
Morphological filters do not use a convolution mask or a weighted average of the local pixels. Instead morphological filters use a xe2x80x9cstructuring elementxe2x80x99 which defines the size (and shape) of the intensity profiles that the user wants to use as a definition of the size and shape of an object.
Once the objects are removed from the backgrounds the various image signals are combined into one or more composite images, allowing a user to display selected information from several scenes as a single scene or image.
According to one aspect of the invention a system employing nonlinear morphological or shape filters provides for the fusion selected images from different sensors sampling the same or substantially the same scene into one or more composite images.
According to another aspect of the invention, images from the same sensor sampling a scene at different times is fused into a single image. This image may be presented on a single display.
According to yet another aspect of this invention, via the use of nonlinear filters, objects contained in an image scene are distinguished from the background structure of the scene based on size or structure.
According to another aspect of the present invention, images from different sensors are fused into a composite image. The images from different sensors may be color coded and selectively presented as a single image scene on a means for display.
According to yet another aspect of the present invention selected images from a single sensor sampling a scene at different times are fused into a single image. The images may be color coded based on the point in time sampled and presented as a single image scene on a means for display.