The fundamental goal of image analysis is to convert image/sensor data into useful information. To facilitate this data-to-information conversion process, researchers have developed sensors that exploit the properties of various sections of the electromagnetic energy spectrum as a means to detect and measure objects in images. “Electromagnetic energy refers to all energy that moves with the velocity of light in a harmonic wave pattern” (Sabin, 1987). Electromagnetic energy (EE) sensors are classified into two broad categories: (1) passive sensors, which use reflected background and emitted energy from objects to generate images, and (2) active sensors, which themselves generate and project energy and then detect the returned signal to image objects. For example, electro-optical (EO) sensors, such as a digital aerial camera and Landsat's multispectral scanner (MSS), are passive sensors; whereas, real aperture radar (RAR), synthetic aperture radar (SAR), and Light Detection and Ranging (LIDAR) are active sensors. An example of fusing passive and active sensor data has been demonstrated by one of the present inventors.
To effectively convert image/sensor data to useful information for object extraction and recognition, researchers have attempted to improve the system from two major directions: (a) improve the capability of the sensor through (i) improving the resolution or ground sampling distance (GSD) of the sensor data/imagery, and (ii) improving sensor dynamic range by increasing the number of detected spectral bands, aspect angles and polarizations; and (b) improve the data analysis algorithms used to process the image/sensor data. Comparing sensing systems, the resolution of Landsat's Thematic Mapper (TM) is approximately 30 meters, whereas, the resolution of Space Imaging's IKONOS is 4 meters for its MSI, and 1 meter for its panchromatic imagery. However, since Landsat TM has seven spectral bands, and IKONOS has only four bands, the LANDSAT TM sensor is superior based on number of spectral bands. Alternatively, since the resolution of the IKONOS system is 4 meters, compared to 30 meters for Landsat TM, the IKONOS sensor is superior based on image resolution. Although the SAR images typically only have one band, this can be increased by using multiple polarization modes of this active sensor. Recent improvement in LIDAR technology permits researcher to exploit using object height information for object discrimination.
While texture analysis algorithms have been used since 1970s, few have studied human subject texture perception compared to machine analyses. In 1978 the Air Force Office of Scientific Research (AFOSR) awarded a contract to Susquehanna Resources and Environment, Inc. to investigate the relationship between micro-texture and global structure in human visual perception of object recognition. Related experiments were published in 1980. In 2006, P. J. Petite and E. L. van den Broek conducted human versus machine analysis using 450 human objects. Both studies found a high degree of individual difference in performing image object recognition. For example, the SRE/AFOSR study found individual difference ranged to the high end of 25 percent. The Petite and Broek study determined the contribution of texture in human perception agreement ranged from 78% in a color-shapeless condition, to 60% in a gray, six shape condition.
While it is possible to increase the number of sensor bands or improve optics to improve the object recognition capability of passive sensors, these sensors (visible and near infra red) are largely limited to daytime and good weather conditions. SAR on the other hand is capable of night-time and near-all-weather sensing; however, it is usually limited to one single band. To improve its capability, researchers have developed a spot light mode SAR capable of imaging the same object and its surrounding field from various aspect angles. For example, if the aspect angle interval is 2 degrees, there will be 180 images for one particular sensing field. The drawback of this increased sampling for object classification is the problem of requiring an unmanageable number of matching libraries to model all possible varying SAR-object interaction conditions.
In contrast to the “unmanageable number of matching libraries,” the human visual system is capable of performing object recognition using much fewer dimensions, particularly when the task is “defined.” For example, in a Hsu and Burright study, the dominant perceptual dimensions were three—Brightness, Texture and Structure—which accounted for 75% to 80% of the human visual system based decision making process.
When sensor data is processed to optimally define the object recognition task for the human visual system, it can mitigate individual perceptual differences. This creates a synergy between a sensor and a perceptual system in object/feature recognition. The current invention is centered precisely on this topic with the innovative system implementation.
For visual processing, an example of a discrete task is “finding a small-size, circular and bright object.” The object descriptors (small, circular, bright) have been called “photo-interpretation keys.” These keys are largely spatial, whereas the primary capability from image sensors is largely spectral. The proposed invention integrates the image sensor spectral capability with human perception spatial analysis capability, in a system that exploits fundamental human visual “figure-ground” recognition capability in a multi-tone image space as the figure-ground mapping propagates over the entire image surface. The invention creates an image surface composed of spectral and spatial figure-ground structures for object/feature recognition.
The current invention is a real-time system of software/hardware, which integrates sensor data with information analysis and fuses the imagery/sensor data/geospatial registration to improve the perception of a human analyst by focusing the analyst's attention on a fundamental figure/ground structure. Since the desired object of interest (OOI) could range from a single object to a complex group of both features and objects, the real-time data analysis and information visualization system needs to be flexible enough to cover both single-feature and multi-feature extraction tasks. Since the task may require many dominant signatures to generate the multi-tone figure-ground structures for object/feature extraction, the human analyst needs to have the best visualization means to make the final decision. To this end, a set of dominant feature/object signatures are generated, and a system implementation for it has been completed.
There is no generally agreed upon mathematical solution to the problem of determining when to stop a scene segmentation process. In the current invention, it is proposed that an extracted object signature be allowed to feed back and combine with the original image data set to define a new signature search space. By the same principle, an independent source is also allowed to influence or enhance the feature space and reduce the object-search space.
The current invention provides a means to visualize how well an observed pixel is matched against a feature signature in a feature signature library with a quantitative measure. Using multiple feature signatures, the analyst has the means to visualize how well a given pixel, and/or a group of pixels match against a set of single and/or group-based feature signatures simultaneously. This real-time processing creates an optimal visualization environment where a figure ground structure is simply a two-tone image made by a simple and yet perceptually meaningful ranking analysis, such as the upper quartile for “figure” and the rest as “ground” from a feature scene.
Using multiple dominant feature signatures, there is the possibility of overlap between two figure ground mappings. To create non-overlapping figure ground mappings, the current invention provides a means to determine or eliminate the overlapped feature pixels and then create a non-overlapping figure ground structure, if necessary.
For the current invention, a feature signature is represented by a set of generic input bands (artificial or real-sensor-generated). If the analyst determines that the signature is not adequate, the number of bands can be increased through appropriate mathematical means. The number of bands may also be decreased, for example to reduce computation time. Therefore, both the feature signature, and the figure-ground structure is tunable, loadable and can be updated in real time.
Remote sensing geospatial accuracy has been limited by data generated with inaccurate and inconsistent ground control points (GCP). Using image rectification methods of pixel resampling or interpolation to register images usually alters the pixels' original digital number (DN) value. By definition, this contaminates/alters the sensor spectral data, rendering it practically unusable for object/feature extraction. As a result, remote sensing researchers have not been able to fully take advantage of multi-sensor imagery and multi-source data to perform object extraction because the data sources are not well geospatially registered. The current invention overcomes this fundamental data fusion problem by allowing users to perform data fusion without requiring a priori registration of the input data. This object oriented methodology is adaptable to multi-source data fusion, since successive video frames are generally not geospatially registered.
As noted earlier, the GCP from dissimilar sensors or same sensor from different acquisitions rarely match, rendering two mismatched coordinate or geo-coordinate systems. In addition, since warping can achieve a fit between two images, the match between a camera model and the reality is rarely tested. The current invention mitigates such coordinate-mismatch and inadequate modeling condition by providing the users with means to deploy a real-time dynamic GCP/image matching and evaluation system to unify two coordinate systems into one field-mapping system with a geogrid centered on a user-selected projection center, such as one of the GCP, yielding a Virtual Equi-Distance system (VEDS) built on the Virtual Transverse Mercator Projection (VTM) described in U.S. Pat. No. 7,343,051 awarded to the lead inventor of the current application.
For the past 20 years, scene content analyses have been approached from the paradigm of scene indexing, content-based object extraction, archiving and retrieval. Few have been oriented toward scene content modeling and testing for system implementation. To fill this gap, this invention uses a set of quantifiable and implementable human language key words in a simultaneous equation like configuration to model and test scene content until an optimal model is determined for a specific object extraction/linking application.
Similarly, for the past 40 years, Automatic Target Recognition (ATR) has been approached from a paradigm of still imagery with few real world applications to show for it.
Therefore, it is more advantageous to develop a feature cueing and recognition system that uses sensing and data processing means to generate multi-faceted figure ground structures for object recognition.
It is also advantageous to generate both “figure” and “ground” as a means for perceptually-based object cueing and recognition. Since the vast majority of past research has been centered on how to create “feature,” rather than “ground,” the current invention represents a significant departure from conventional approaches.
It is advantageous that “figure” versus “ground” decision be tunable by the analyst.
It is also advantageous that a figure ground structure be definable by a feature signature and multiple feature signatures to generate a multi-tone figure ground structure mosaic system over an image.
It is advantageous to be able to either increase or decrease the dimensions of the feature space.
It is equally advantageous to know which area contains known objects, and which area does not contain objects of interest.
It is advantageous to visualize quantitatively the spectral and spatial relationships between a known object/feature and an observed pixel in real time.
It is also advantageous to use one object signature extracted from an image to generate object based images for object recognition from other images of dissimilar resolution and dissimilar sensor type.
It is also advantageous to use an independent object signature to influence the object search space for object recognition.
Similarly, it is advantageous to classify a pixel as “other” to absorb pixels that are dissimilar to a set of labeled feature signatures.
It is also advantageous to use all “others” to create generic labeled feature signatures to generate an overall figure/ground structure that includes “unknowns” for object cueing and recognition.
It is advantageous to permit generation of a feature signature for each of the “unknowns” as a new set into the feature signature space.
It is also important and advantageous to post-process feature-signature scenes using spectral and spatial analysis means for object cueing and recognition.
Similarly, it is also important and advantageous to visualize how well a pixel is matched to each of the input feature signatures.
Similarly, it is also important and advantageous to allow the analyst to modify a feature signature in real time.
Similarly, it is important and advantageous to visualize how well a group of pixels is matched to a group of feature signatures.
It is also important and advantageous to generate a color composite to visualize the multi-tone figure ground structures for object cueing and recognition.
It is critical and advantageous that the above figure ground visualization system be either an automated system, a user-based interactive system, or a firmware based system.
Similarly, it is advantageous to accept video images as input rather than limit a system to process only still images.
Using video images as input, it is advantageous to accept subset-partitioned images as input. Accepting a moving object versus non-moving object partitioned image as input is a case in point.
Similarly, it is important and advantageous to generalize from a two-feature partitioned image to a generic multi-subset partitioned image as input to the inventive system.
With a generic subset-partitioned image as input, it is advantageous to generate the scene content for each subset region with the above-discussed dominant features plus their “ground” sets as either “other features” or “unknowns.”
It is critical and advantageous to allow users to perform multi-source data fusion without a priori registration. Geospatially registered inputs would also be accepted to serve as a base to register any unregistered inputs as well.
Similarly, it is advantageous to disseminate the results of data analysis, pixel-matching, enhanced pixels for object perception, visualized sub-scene and full-scene and associated databases to users who may provide independent sources as feedback to the current invention system.
Similar to a visualization environment, it is advantageous to provide a computational environment to use multiple pixels from an individual object as a spectral signature complex to perform pixel discrimination.
It is also advantageous to visualize a spectral complex for pixel discrimination and object recognition using a moving window having a size selected by the user.
It is equally advantageous to define a spectral signature complex by both spectral and spatial criteria.
It is also advantageous to use an independent source as an additional factor to define a spectral signature complex.
It is advantageous to allow both a feature signature library and a rule set library to be an open system, tunable, modifiable, and updatable on the fly for real-time applications, and/or editable for non-real-time batch processing applications.
It is advantageous to use human language key words that are quantifiable and implementable for scene content/structure modeling and testing.
It is definitely advantageous to test competing implementations of a core data analysis routine, feature-signature model, and its corresponding visualization means.
It is also advantageous to use a dynamic GCP/image matching module to implement a real-time field-based situation awareness system in which a GCP is selectable and fine-tunable with interactive/automated plus evaluation and visualization means.
It is equally advantageous to generate a geogrid using a user-selected equi-distance projection center, such as one of the GCP, yielding a geogridded virtual Universal Transverse Mercator Equi-Distance system (VTM-ED) for a rapid field geospatial awareness application.
It is also advantageous to test the goodness-of-fit between a “camera” model and the ground truth, and devise a method to achieve a match within a predetermined level of confidence between the model and the actual reality, and also the base image versus the aligned image dynamically in the context of the control points and the control area.
Lastly, it is advantageous to perform automatic target recognition in the paradigm of time-varying image analysis, yielding an innovative system called Event-based Automatic Target Recognition (E-ATR).
Having said these advantageous aspects of this new invention, the current embodiment of this invention is now described as follows.