The present invention pertains to object and feature identification in an image and, more particularly, to scene segmentation and object/feature extraction by generating uniform regions from a single band image or a plurality thereof using a self-determining, self-calibrating, improved stable structure, pseudo multispectral color, and multi-level resolution processing technique, and associated matching methods for object identification.
An image is basically a data matrix of m rows and n columns. An element of that image matrix is called a picture element, or a pixel. An image becomes meaningful when a user is able to partition the image into a number of recognizable regions that correspond to known natural features, such as rivers and forests, or to man-made objects. Once this higher-level of image generalization is completed, each distinct feature/object, being a uniform field, can be identified. The process by which such a uniform field is generated is generally referred to as segmentation. The process by which a segmented region is matched with a rule set or a model is referred to as identification.
Dozens of techniques have been used by researchers to perform image segmentation. They can be grouped into three major categories: (1) class-interval based segmentors, (2) edge-based segmentors, and (3) region-based segmentors.
A given image has 0(zero) as the minimum pixel value and 255 as the maximum pixel value. By mapping all pixels whose intensity values are, say, between 0 and 20 into one category, a simple thresholding method can be used to perform image segmentation.
An edge is generally defined as the difference between adjacent pixels. Edge-based image segmentation is performed by generating an edge map and linking the edge pixels to form a closed contour. A review of this class of segmentors can be obtained from Farag. (Remote Sensing Reviews, Vol. 6, No. 1-4, 1992, pp. 95-121.)
Region-based segmentation reverses the process of edge-based segmentation, because it starts with the interior of a potential uniform field rather than with its outer boundary. The process generally begins with two adjacent pixels and one or more rules used to decide whether merging of these two candidates should occur. One of the examples of this class of segmentors can be found in Tenorio using a Markov random field approach. (Remote Sensing Reviews, Vol. 6, No. 1-4, 1992, pp. 141-153.)
All conventional segmentors share the following fundamental features:
1) the segmentation process is generally performed on a single band image;
2) the segmentation process follows well-defined mathematical decision rules;
3) except for simple thresholding, all segmentors are computationally expensive and/or intensive; and
4) none of the conventional techniques is self-determining or self-calibrating.
If segmentation is defined as the process of generating distinct uniform fields from a scene, a human visual system that is based on color perception should also be considered a segmenter. In contrast to mathematics-based segmentation schemes, color-based segmentation relies on the use of three spectrally-derived images. These multiple images are, in most cases, generated from a physical device called a multispectral sensor. The advantage of this method over mathematical segmentors is its ability to perform scene segmentation with minimal or no mathematical computation.
For purposes of clarity throughout this discussion, it should be understood that the concept of three spectrally-derived (color) images, while representing the preferred embodiment, is merely a subset of a more general concept: any composite having component ranges which may be transformed into two or more respective component parts and then projected into a common space.
Color-based segmentors require input of three spectrally distinct bands or colors. A true color picture can be generated from a scene taken by three registered bands in the spectral regions of blue, green and red, respectively. Then, they are combined into a composite image using three color filters: red, green and blue. The resultant color scene is indeed a segmented scene because each color can represent a uniform field.
The above discussion is related to region-based segmentation. In edge-based segmentation, all of the conventional techniques use well-defined mathematical formulae to define an edge. After edges are extracted, another set of mathematical rules is used to join edges and/or eliminate edges in order to generate a closed contour to define a uniform region. In other words, none of the conventional techniques uses the scene itself to define an edge even though, in a more global point of view, an edge is, in fact, defined by the scene itself.
If a region or an edge can be generated from the content of the scene itself, it should be possible to integrate both region-based and edge-based segmentation methods into a single, integrated process rather than using two opposing philosophies.
Object identification is a subsequent action after segmentation to label an object using commonly-accepted object names, such as a river, a forest or an M-60 tank. While object recognition can be achieved from a variety of approaches (such as statistical document functions and rule-based and model-based matching), all of these conventional methods require extracting representative features as an intermediate step toward the final object identification. The extracted features can be spectral reflectance-based, texture-based and shape-based. Statistical pattern recognition is a subset of standard multivariable statistical methods and thus does not require further discussion. A rule-based recognition scheme is a subset of conventional, artificial intelligence (AI) methods that enjoyed popularity during the late 1980s. Shape analysis is a subset of model-based approaches that requires extraction of object features from the boundary contour or a set of depth contours. Sophisticated features include Fourier descriptors and moments. The effectiveness of depth information was compared to boundary-only based information, Wang, Gorman and Kuhl (Remote Sensing Reviews, Vol. 6, No. 1-4, pp. 129+). In addition, the classifier performance between range moments and Fourier descriptors was contrasted.
An object is identified when a match is found between an observed object and a calibration sample. A set of calibration samples constitutes a (calibration) library. A conventional object library has two distinct characteristics: 1) it is feature based and 2) it is full-shape based. The present invention reflects a drastically different approach to object identification because it does not require feature extraction as an intermediate step toward recognition and it can handle partially-occluded objects.
Feature extraction uses fewer but effective (representative) attributes to characterize an object. While it has the advantage of economics in computing, it runs the risk of selecting wrong features and using incomplete information sets in the recognition process. A full-shape model assumes that the object is not contaminated by noise and/or obscured by ground clutter. This assumption, unfortunately, rarely corresponds to real-world sensing conditions.
Depth contours are used for matching three-dimensional (3-D) objects generated from a laser radar with 3-D models generated from wireframe models. In real-world conditions, any image is a 3-D image because the intensity values of the image constitute the third dimension of a generalized image. The difference between a laser radar based image and a general spectral-based image is that the former has a well-defined third dimension and the latter does not.
It has been proven that the majority of objective discrimination comes from the boundary contour, not the depth contour (Wang, Gorman and Kuhl, Remote Sensing Review, Vol. 6,Nos. 1-4, pp. 129-?, 1992(?)). Therefore, the present invention uses a generalized 3-D representation scheme to accommodate the general image. This is accomplished by using the height above the ground (called height library) as an additional library to the existing depth library. The resultant library is called a dual depth and height library.
It would be advantageous to provide a much simpler, more effective and more efficient process for image segmentation, one that achieves an integration between region-based and edge-based segmentation methodologies which, heretofore, have been treated as mutually exclusive processes.
It would also be advantageous to generate uniform regions of an image so that objects and features could be extracted therefrom.
It would also be advantageous to provide a method for segmenting an image with minimal mathematical computation and without requiring two or more spectrally-derived images.
It would also be advantageous to provide a flexible and arbitrary scheme to generate colors.
It would also be advantageous to use the human phenomenon of color perception to perform scene segmentation on only one spectral band.
It would be advantageous to provide an object identification scheme that does not rely on a predetermined number of features and fixed characteristics of features.
It would also be advantageous to provide an object identification scheme to facilitate object matching either in a full-shape or partial-shape condition.
It would also be advantageous to provide an object identification system that is both featureless and full and partial shape based.
It would also be advantageous to provide a mathematical model that can handle both featureless and full/partial shape cases.
It would also be advantageous to provide a library construction scheme that is adaptable to both featureless and full/partial shape based object recognition scenarios.
It would also be advantageous to provide a dual library (depth and height) to perform general 3-D object recognition using any type of image.
It would also be advantageous to provide a full object identification system that is capable of integrating the previously described novel segmentation and novel object recognition subsystems.
In accordance with the present invention, there is provided a Geographical Information System (GIS) processor to perform scene segmentation and object/feature extraction. GIS has been called a collection of computer hardware, software, geographical data and personnel designed to efficiently manipulate, analyze, and display all forms of geographically referenced information. The invention features the use of the fundamental concept of color perception and multi-level resolution in self-determining and self-calibration modes. The technique uses only a single image, instead of multiple images as the input to generate segmented images. Moreover, a flexible and arbitrary scheme is incorporated, rather than a fixed scheme of segmentation analysis. The process allows users to perform digital analysis using any appropriate means for object extraction after an image is segmented. First, an image is retrieved. The image is then transformed into at least two distinct bands. Each transformed image is then projected into a color domain or a multi-level resolution setting. A segmented image is then created from all of the transformed images. The segmented image is analyzed to identify objects. Object identification is achieved by matching a segmented region against an image library. A featureless library contains full shape, partial shape and real-world images in a dual library system. The depth contours and height-above-ground structural components constitute a dual library. Also provided is a mathematical model called a Parzen window-based statistical/neural network classifier, which forms an integral part of this featureless dual library object identification system. All images are considered three-dimensional. Laser radar based 3-D images represent a special case.
Analogous to transforming a single image into multiple bands for segmentation would be to generate multiple resolutions from one image and then to combine such resolutions together to achieve the extraction of uniform regions. Object extraction is achieved by comparing the original image and a reconstructed image based on the reduced-resolution image. The reconstruction is achieved by doubling the pixel element in both x and y directions. Edge extraction is accomplished by performing a simple comparison between the original image and the reconstructed image. This segmentation scheme becomes more complex when two or more sets of pair-wise comparisons are made and combined together to derive the final segmentation map. This integration scheme is based on mathematical morphology in the context of conditional probability.
To accommodate featureless and full/partial shape based object identification, the present invention proposes the use of a mixture of full-shape and partial-shape models plus real-world images as a calibration library for matching against the segmented real-world images. Moreover, in accordance with the invention, the library is constructed in the image domain so that features need not be extracted and real-world images can be added freely to the library. The invention further provides a mathematical model for the classifier using the Parzen window approach.