I. Field of the Invention
The invention relates generally to the field of image processing. More specifically, the invention relates to a processing unit and methods for simultaneous processing of imaging sensor data streams of scenes observed by different kinds of imaging sensors at different times with different viewing geometries. Salient attributes in the scenes, as observed by each of the imaging sensor systems, are identified by running a plurality of processing algorithms on the image data which are in the form of convolutions on spatial, temporal, and color contents of the images which emulate the image processing of the human visual path consisting of eye, retina, and cortex. The invention produces object detections, object tracks, object classifications, and activity recognitions and interpretations. The salient features derived for objects of interest in the scenes from each imaging sensors are compared by means of cross-modal correlation of the different sensor analysis results. The correlation of salient features across the sets of imagery enables a common operating picture of the observed space to be assembled
2. Description of the Related Art
Current approaches to the processing of various sensor data streams that are observing or have observed common scenes in order to determine object content and activities in the scene based on the multiple looks involve the matching of object images to high fidelity three dimensional models of objects and activities of interest. These techniques are often referred to as Automatic Target Recognition, or ATR, processing. These techniques are quite limited because the template matching process is marginalized as viewing geometries, target orientations, degrees of target obscuration, and environmental conditions vary. Such processing is expensive, requires significant time and human skill to achieve the desired cross observation results.
What is needed is a general process of extracting the salient characteristics of scene objects from each of the various times of observations and from each of the various sensors and using the derived cognitive saliency values to associate the observations of given objects across the various data sets. In addition there is a need for being able to execute the cross modal correlations in near real-time by hosting the processing architectures on specially designed processors that can accommodate the massive data flows from the disparate sensors suites and accomplish the massively parallel processing necessary to execute the cognitive saliency computations.