An image is basically a data matrix of m rows and n columns. An element of that image matrix is called a picture element, or a pixel. An image becomes meaningful when a user is able to partition the image into a number of recognizable regions that correspond to known natural features, such as rivers and forests, or to man-made objects. Once this higher-level of image generalization is completed, each distinct feature/object, being a uniform field, can be identified. The process by which such a uniform field is generated is generally referred to as segmentation. The process by which a segmented region is matched with a rule set or a model is referred to as identification.
Dozens of techniques have been used by researchers to perform image segmentation. They can be grouped into three major categories: (1) class-interval based segmentors, (2) edge-based segmentors, and (3) region-based segmentors.
A given image has 0 (zero) as the minimum pixel value and 255 as the maximum pixel value. By mapping all pixels whose intensity values are, say, between 0 and 20 into one category, a simple thresholding method can be used to perform image segmentation.
An edge is generally defined as the difference between adjacent pixels. Edge-based image segmentation is performed by generating an edge map and linking the edge pixels to form a closed contour. A review of this class of segmentors can be obtained from Farag. (Remote Sensing Reviews, Vol. 6, No. 1-4, 1992, pp. 95-121.)
Region-based segmentation reverses the process of edge-based segmentation, because it starts with the interior of a potential uniform field rather than with its outer boundary. The process generally begins with two adjacent pixels and one or more rules used to decide whether merging of these two candidates should occur. One of the examples of this class of segmentors can be found in Tenorio using a Markov random field approach. (Remote Sensing Reviews, Vol. 6, No. 1-4, 1992, pp. 141-153.)
All conventional segmentors share the following fundamental features:
1) the segmentation process is generally performed on a single band image; PA1 2) the segmentation process follows well-defined mathematical decision rules; PA1 3) except for simple thresholding, all segmentors are computationally expensive and/or intensive; and PA1 4) none of the conventional techniques is self-determining or self-calibrating.
If segmentation is defined as the process of generating distinct uniform fields from a scene, a human visual system that is based on color perception should also be considered a segmenter. In contrast to mathematics-based segmentation schemes, color-based segmentation relies on the use of three spectrally-derived images. These multiple images are, in most cases, generated from a physical device called a multispectral sensor. The advantage of this method over mathematical segmentors is its ability to perform scene segmentation with minimal or no mathematical computation.
For purposes of clarity throughout this discussion, it should be understood that the concept of three spectrally-derived (color) images, while representing the preferred embodiment, is merely a subset of a more general concept: any composite having component ranges which may be transformed into two or more respective component parts and then projected into a common space.
Color-based segmentors require input of three spectrally distinct bands or colors. A true color picture can be generated from a scene taken by three registered bands in the spectral regions of blue, green and red, respectively. Then, they are combined into a composite image using three color filters: red, green and blue. The resultant color scene is indeed a segmented scene because each color can represent a uniform field.
The above discussion is related to region-based segmentation. In edge-based segmentation, all of the conventional techniques use well-defined mathematical formulae to define an edge. After edges are extracted, another set of mathematical rules is used to join edges and/or eliminate edges in order to generate a closed contour to define a uniform region. In other words, none of the conventional techniques uses the scene itself to define an edge even though, in a more global point of view, an edge is, in fact, defined by the scene itself.
If a region or an edge can be generated from the content of the scene itself, it should be possible to integrate both region-based and edge-based segmentation methods into a single, integrated process rather than using two opposing philosophies.
Object identification is a subsequent action after segmentation to label an object using commonly-accepted object names, such as a river, a forest or an M-60 tank. While object recognition can be achieved from a variety of approaches (such as statistical document functions and rule-based and model-based matching), all of these conventional methods require extracting representative features as an intermediate step toward the final object identification. The extracted features can be spectral reflectance-based, texture-based and shape-based. Statistical pattern recognition is a subset of standard multivariable statistical methods and thus does not require further discussion. A rule-based recognition scheme is a subset of conventional, artificial intelligence (AI) methods that enjoyed popularity during the late 1980s. Shape analysis is a subset of model-based approaches that requires extraction of object features from the boundary contour or a set of depth contours. Sophisticated features include Fourier descriptors and moments. The effectiveness of depth information was compared to boundary-only based information, Wang, Gorman and Kuhl (Remote Sensing Reviews, Vol. 6, No. 1-4, pp. 129+). In addition, the classifier performance between range moments and Fourier descriptors was contrasted.
An object is identified when a match is found between an observed object and a calibration sample. A set of calibration samples constitutes a (calibration) library. A conventional object library has two distinct characteristics: 1) it is feature based and 2) it is full-shape based. The present invention reflects a drastically different approach to object identification because it does not require feature extraction as an intermediate step toward recognition and it can handle partially-occluded objects.
Feature extraction uses fewer but effective (representative) attributes to characterize an object. While it has the advantage of economics in computing, it runs the risk of selecting wrong features and using incomplete information sets in the recognition process. A full-shape model assumes that the object is not contaminated by noise and/or obscured by ground clutter. This assumption, unfortunately, rarely corresponds to real-world sensing conditions.
Depth contours are used for matching three-dimensional (3-D) objects generated from a laser radar with 3-D models generated from wireframe models. In real-world conditions, any image is a 3-D image because the intensity values of the image constitute the third dimension of a generalized image. The difference between a laser radar based image and a general spectral-based image is that the former has a well-defined third dimension and the latter does not.
It has been proven that the majority of objective discrimination comes from the boundary contour, not the depth contour (Wang, Gorman and Kuhl, Remote Sensing Review, Vol. 6, Nos. 1-4, pp. 129-?, 1992(?)). Therefore, the present invention uses a generalized 3-D representation scheme to accommodate the general image. This is accomplished by using the height above the ground (called height library) as an additional library to the existing depth library. The resultant library is called a dual depth and height library.
It would be advantageous to provide a much simpler, more effective and more efficient process for image segmentation, one that achieves an integration between region-based and edge-based segmentation methodologies which, heretofore, have been treated as mutually exclusive processes.
It would also be advantageous to generate uniform regions of an image so that objects and features could be extracted therefrom.
It would also be advantageous to provide a method for segmenting an image with minimal mathematical computation and without requiring two or more spectrally-derived images.
It would also be advantageous to provide a flexible and arbitrary scheme to generate colors.
It would also be advantageous to use the human phenomenon of color perception to perform scene segmentation on only one spectral band.
It would be advantageous to provide an object identification scheme that does not rely on a predetermined number of features and fixed characteristics of features.
It would also be advantageous to provide an object identification scheme to facilitate object matching either in a full-shape or partial-shape condition.
It would also be advantageous to provide an object identification system that is both featureless and full and partial shape based.
It would also be advantageous to provide a mathematical model that can handle both featureless and full/partial shape cases.
It would also be advantageous to provide a library construction scheme that is adaptable to both featureless and full/partial shape based object recognition scenarios.
It would also be advantageous to provide a dual library (depth and height) to perform general 3-D object recognition using any type of image.
It would also be advantageous to provide a full object identification system that is capable of integrating the previously described novel segmentation and novel object recognition subsystems.