This invention relates to an integrated system comprising simultaneous co-registered sensing of thermally emitted radiation and sensing of reflected radiation in the visible/NIR/SWIR from the same view and fusing these sensing modalities to provide important visual information to an observer, or to an automated image processing system for interpreting the scene.
For the most part, electromagnetic radiation sensed in an outdoor or indoor scene in the visible, near infrared (NIR) and shortwave infrared (SWIR) spectrums results from reflection, while radiation sensed at wavelengths from 3-15 microns mostly results from thermal emission. There are a number of exceptions, such as visible emission from the sun and the presence of significant reflection in the 3-5 micron midwave infrared (MWIR) spectrum, but the complementarity of reflection and thermal emission is generally acknowledged with respect to the visible/NIR/SWIR and wavelengths above 3 microns.
The advantages of having sensing capability for both reflected radiation and thermal emission has been noted in some recent patents. In U.S. Pat. No. 5,534,696 a sight apparatus is designed so that a viewer can better observe thermal IR imagery in the context of a direct visible view of a scene. In U.S. Pat. No. 6,020,994 an integrated apparatus includes the capability of an observer to switch between direct visible view and thermal IR for wide and narrow fields of view. In U.S. Pat. No. 5,944,653 visible and thermal IR views are boresighted within an endoscopic system. U.S. Pat. No. 5,808,350 teaches a design for integrating IR, visible and NIR sensing in the same focal plane array.
A number of computational algorithms for image fusion have already been developed for visible and thermal IR imagery. See for example:
(1) L.xcx9cvanxcx9cRuyven A.xcx9cToet and J.xcx9cValeton. Merging thermal and visual images by a contrast pyramid. Optical Engineering, 28(7):789-792, 1989.
(2) D.xcx9cFay J. Racamato J. Carrick M.xcx9cSeibert A.xcx9cWaxman, A.xcx9cGove and E.xcx9cSavoye. Color night vision: Opponent processing in the fusion of visible and {R} imagery. Neural Networks}, 10(1):1-6, 1997.
(3) P.xcx9cBurt and R.xcx9cLolczynski. Enhanced image capture through fusion. Proceedings of IEEE 4th International Conference on Computer Vision, volumexcx9c4, pages 173-182, 1993.
(4) J.xcx9cSchuler M.xcx9cSatyshur D.xcx9cScribner, P.xcx9cWarren and M.xcx9cKruer. Infrared color vision: An approach to sensor fusion. Optics and Photonics News, August 1998.
(5) B.xcx9cS.xcx9cManjunath H.xcx9cLui and S.xcx9cK. Mitra. Multi-sensor image fusion using the wavelet transform. Proceedings IEEE International Conference on Image Processing, pages 51-55, 1994.
(6) D.xcx9cSocolinsky and L.xcx9cB. Wolff. Visualizing local contrast for multispectral imagery. Pending U.S. patent application, 1998.
(7) D.xcx9cA. Socolinsky and L.xcx9cB. Wolff. Optimal grayscale visualization of local contrast in multispectral imagery. Proceedings: DARPA Image Understanding Workshop, pages 761-766, Monterey, November 1998.
(8) D.xcx9cA. Socolinsky and L.xcx9cB. Wolff. A new paradigm for multispectral image visualization and data fusion. Proceedings:CVPR ""99}, Fort Collins, June 1999.
(9) A.xcx9cToet. Hierarchical image fusion. Machine Vision and Applications, pages 1-11, March 1990 .
(10) A.xcx9cToet. New false color mapping for image fusion. Optical Engineering}, 35(3):650-658, 1996.
(11) H. A. MacLeod, xe2x80x9cThin Film optical filtersxe2x80x9d, Institute of Physics Publishers, 3rd edition, March 2001
References (2), (4) and (10) have proposed psychophysically motivated image fusion including the use of neural network approaches. References (3) and (5) develop wavelet image fusion methods. References (1) and (9) develop hierarchical image fusion algorithms. References (6), (7) and (8) develop image fusion algorithms that combine first-order contrast.
The imaging modalities of visible/NIR/SWIR and of thermal IR reveal complementary physical information with respect to one another for most typical scenes; visible/NIR/SWIR imagery senses reflected light radiation while thermal IR imagery senses mostly thermally emitted radiation, Fusing these imaging modalities using optics, sensor hardware and image processing algorithms can provide large advantages for human visual enhancement and automated image understanding.
This invention relates to a sensor system design that integrates optics, sensing hardware and computational processing to achieve optimum utilization of complementary information provided by the fusion of visible/NIR/SWIR and thermal IR imagery. This is accomplished through accurate co-registration of these respective modalities and then either optimum presentation/visualization to an observer or output of accurately co-registered information to an automated image understanding system. In the absence of a monolithic device that can simultaneously sense visible/NIR/SWIR and thermal IR at a pixel, two separate sensing arrays must be brought into exact alignment such that corresponding pixels view exactly the same scene element. Previous inventions, although sometimes citing common optical systems, do not achieve nor emphasize the importance of accurate co-registration for reflective and thermal emission.
Boresighted sensing attempts to image the same scene with two different imaging modalities placed side-by-side. Although both sensors are close in proximity, the view orientation and magnification respective to both sensors being slightly different makes co-registration dependent upon external 3-D depth of scene elements, which is almost always unknown and changes from scene-to-scene. Single window systems suffer the same co-registration problems as they require separate focusing optics for respective focal plane sensing arrays. Apart from ever present differences in magnification and distortion, separate focusing optics always creates a small stereo baseline between focal plane arrays which means that co-registration will not xe2x80x98trackxe2x80x99 with depth in a scene.
The accurate co-registration between a subspectrum of visible/NIR/SWIR and a subspectrum of thermal IR for the first time enables the application of computational fusion algorithms such as those described by References 1-10 listed above, which produce composite visualizations of dual reflective thermal IR imagery. Accurate co-registration also enables automated image understanding algorithms to perform computations including optic flow, tracking, biometric recognition, automatic target recognition.
A way to achieve accurate co-registration independent of depth in a scene is, for all focusing optics, to be common to both focal plane arrays. This can be achieved by using a single objective lens at the front-end of the apparatus through which all sensed radiation is focused onto respective focal plane arrays. A dichroic beamsplitter merely directs the appropriate subspectrum of incident radiation onto the corresponding sensing array and is optically a focal.
With depth independent co-registration the co-registration mapping between both focal plane arrays is an affine linear transformation having the following form:       (                                        X            2                                                            Y            2                                                1                      )    =            (                                    A                                B                                              -                              T                x                                                                          C                                D                                              -                              T                y                                                                          0                                0                                1                              )        ⁢          xe2x80x83        ⁢          (                                                  X              1                                                                          Y              1                                                            1                              )      
where image coordinates (X1, Y1) on the 1st image plane are mapped to image coordinates (X2, Y2) on the 2nd image plane. The parameters Tx and Ty are respectively translation in x and y, while the upper left 2xc3x972 submatrix can be decomposed as the product of a rotation by angle xcex8 and magnifications Sx and Sy in x and y respectively, according to:       (                            A                          B                                      C                          D                      )    =            (                                                  cos              ⁢                              xe2x80x83                            ⁢              θ                                                                          -                sin                            ⁢                              xe2x80x83                            ⁢              θ                                                                          sin              ⁢                              xe2x80x83                            ⁢              θ                                                          cos              ⁢                              xe2x80x83                            ⁢              θ                                          )        ⁢          xe2x80x83        ⁢          (                                                  S              x                                            0                                                0                                              S              y                                          )      
The scaling parameters account for differences in physical horizontal and vertical pixel size between the two focal plane arrays, and, rotation and translation parameters account for corresponding relative rotation and translation of the two focal plane arrays with respect to one another.
One of the major stumbling blocks for current face recognition technology based on visible imagery is that intra-personal variations due to changes in ambient illumination between acquisition of the training set and the probe image is often larger than inter-personal variation. This makes recognition and identification systems very sensitive to change in imaging conditions and restricts their applicability. The illumination invariance property of thermal IR (especially LWIR imagery) is a promising alternative to solve this problem. However, visible imagery enjoys more highly textured detail that may be valuable for recognition. Additionally, glass is opaque to longwave infrared and therefore any person wearing eyeglasses will have a significant portion of their face obscured to a thermal sensor. Furthermore, while thermal imagery is invariant to illumination, it is not invariant to the activity level of the subject. For example, someone being imaged outside on a cold day will have colder cheeks, nose and ears than if imaged indoors. Either of these modalities alone suffers from shortcomings which limit its applicability in an unconstrained face recognition scenario. However, the strengths of one modality complement the weaknesses of the other, and vice-versa, thus making the cooperative sensor fusion of visible/NIR/SWIR and thermal IR in the present invention a powerful technology for increasing the performance of face recognition and perhaps other biometric recognition methods for detection and recognition of humans from portions of the body either external or internal. For instance, perfectly co-registered visible/thermal IR endoscopic imagery can significantly improve upon the invention in U.S. Pat. No. 5,944,653.
The complementarity of reflective versus thermal phenomenology give the present invention some unique capabilities for surveillance and monitoring, and tracking. Being able to separate moving objects from their shadows means that we can segment the true shape of the object, without confounders, and use that shape for classification and identification. The fact that most objects of interest in a surveillance situation are warmer than their background (humans, cars) means there is less confusion arising from spurious moving objects such as tree branches swaying in the wind. On the other hand, a parked vehicle which is suddenly started and begins moving may be more easily detectable in the visible spectrum, since its temperature is similar to the ambient temperature (depending on several factors). Additionally, once a moving target has been identified and segmented, it may be desirable for a human operator to verify the nature of the object, for example the identity of a pedestrian. In this case, visible imagery is superior since humans are well adapted to that modality.