Multispectral and hyperspectral images are composed of amounts of data that are impractical to analyze manually. These data include multiple spectral bands that are not visualized or assessed readily. Conventional multispectral sensors provide only a few spectral bands of imagery, nominally covering pre-specified portions of the visible to the near infrared spectra. Conventional hyperspectral sensors may cover hundreds of spectral bands spanning a pre-specified portion of the electromagnetic spectrum. Thus, hyperspectral sensors may provide greater spectral discrimination than multispectral sensors and allow non-literal processing of data to detect and classify material content as well as structure.
An image may be represented mathematically as a matrix of m rows and n columns of elements. An element of such a matrix defining a two-dimensional (2-D) image is termed a picture element, or pixel. An image is usable when a viewer is able to partition the image into a number of recognizable regions that correspond to known features, such as trees, lakes, and man-made objects. Once this level of imaging is attained, each distinct feature and object may be identified since each is represented by an essentially uniform field. The process that generates such uniform fields is known as segmentation.
Many techniques have been used to segment images. Segmentation may be class-interval based, edge-based, and region-based.
For 8-bit precision data, a given image may assume pixel (element) values from a minimum of zero to a maximum of 255. By mapping into one category those pixels whose intensity values are within a certain range or class interval, e.g., 0–20, a simple threshold method may be used to segment.
An edge may be defined by observing the difference between adjacent pixels. Edge-based segmentation generates an edge map, linking the edge pixels to form a closed contour. In conventional edge-based segmentation, well-defined mathematical formulae are used to define an edge. After edges are extracted, another set of mathematical rules may be used to join, eliminate, or both join and eliminate edges, thus generating a closed contour around a uniform region. That is, the scene itself is not used to define an edge even though, globally, an edge may be defined by the scene.
Region-based segmentation is the antithesis of edge-based segmentation. It begins at the interior of a potentially uniform field rather than at its outer boundary. It may be initiated with any two interior adjacent pixels. One or more rules, such as a Markov Random Field (MRF) approach, are used to decide whether merging of these two candidates should occur. In general, conventional region-based segmentation is performed on an image within but a single spectral band, follows well-defined mathematical decision rules, is computationally intensive, and thus expensive, and is not self-determining or self-calibrating.
Color-based segmentation requires input of three spectrally distinct bands or colors. A true color video image may be generated from a scene taken by three bands of blue, green and red. They may be combined into a composite image using individual filters of the same three colors. The resultant color image may be considered a segmented image because each color may represent a uniform field.
If a region or an edge may be generated from the content of the scene, it should be possible to integrate both region-based and edge-based segmentation methods into a single, integrated process. The process by which a segment, or region, is matched with a rule set, or model, is termed identification.
Identification occurs after segmentation. It results in labeling structure using commonly-accepted names, such as river, forest or automobile. While identification may be achieved in a number of ways, such as statistical document functions and rule-based and model-based matching, all require extracting representative features as an intermediate step. Extracted features may be spectral reflectance-based, texture-based, and shape-based.
Statistical pattern recognition uses standard multivariable statistical methods. Rule-based recognition schemes use conventional artificial intelligence (AI). Shape analysis uses a model-based approach that requires extraction of features from the boundary contour or a set of depth contours. Sophisticated features that may be extracted include Fourier descriptors and moments. Structure is identified when a match is found between observed structure and a calibration sample. A set of calibration samples constitutes a calibration library. A conventional library is both feature and full-shape based.
Feature extraction uses a few, but effective, representative attributes to characterize structure. While it capitalizes on economy of computation, it may select incorrect features and use incomplete information sets in the recognition process. A full-shape model assumes that structure is not contaminated by noise, obscured by ground clutter, or both. In general, this assumption does not correspond to the operation of actual sensors.
Depth contours match three-dimensional (3-D) structure generated from a sensor with 3-D models generated from wire frames. In general, all actual images are 3-D because the intensity values of the image constitute the third dimension, although all are not created equal. For example, a LADAR image has a well-defined third dimension and a general spectral-based image does not. However, most objective discrimination comes from the boundary contour, not the depth contour.
Detection, classification (segmentation), and identification techniques applied to hyperspectral imagery are inherently either full-pixel or mixed-pixel techniques in which each pixel vector in the image records the spectral information. Full-pixel techniques operate on the assumption that each pixel vector measures the response of one predominate underlying material, or signal, at each site in a scene. However, the underlying assumption for mixed-pixel techniques is that each pixel vector measures the response of multiple underlying materials, or signals, at each site. In actuality, an image may be represented best by a combination of the two. Although some sites represent a single material, others are mixtures of multiple materials. Rand, Robert S. and Daniel M. Keenan, A Spectral Mixture Process Conditioned by Gibbs-Based Partitioning, IEEE Transactions on Geoscience and Remote Sensing, Vol. 39, No. 7, pp. 1421–1434, July 2001.
The simplest full-pixel technique involves spectral matching. Spectra of interest in an image are matched to training spectra obtained from a library or the image itself. Metrics for determining the degree of match include: Euclidian distance, derivative difference, and spectral angle. If the relative number of mixed pixels in a scene is significant, then spectral matching of this type should not be used. Class label assignments generated by spectral matching algorithms are not affected by spatial neighborhoods, however, consistency of class labels in localized spatial neighborhoods, termed “spatial localization,” is important in mapping applications.
Other full-pixel methods include various supervised and unsupervised segmentation techniques. These are based on statistical and pattern recognition methods normally used for multispectral image processing. The training is also done using data from libraries or the scene imagery itself. Specific techniques include: statistical linear discrimination, e.g., Fisher's linear discriminant; quadratic multivariate classifiers, e.g., Mahalanobis and Bayesian maximum likelihood (ML) classifiers; and neural networks.
The quadratic methods require low-dimensional pixel vectors, and thus are preceded by a data reduction operation to reduce the number of spectral bands addressed. Effective neural networks, such as the multilayer feedforward neural network (MLFN), may be built to model quadratic and higher order decision surfaces without data reduction. Although the MLFN may be trained to identify materials perturbed by a limited amount of mixing, usually it does not include any spatial localization in the decision process.
The most common unsupervised algorithms for clustering imagery are KMEANS and ISODATA, in which the metric used in determining cluster membership is Euclidian distance. Euclidian distance does not provide an adequate response when fine structure or shapes are presented in high resolution spectra, being overly sensitive to intensity changes. Additionally, these methods do not include any spatial localization in the clustering operation.
Spectral mixture analysis (SMA) techniques used with mixed-pixel approaches address some of the shortcomings of full-pixel techniques. SMA employs linear statistical modeling, signal processing techniques, or both. SMA techniques are governed by the relationship:Xs=Hβs+ηs  (1)where:                Xs=observed reflected energy from site s        βs=modeling parameter vector associated with mixture proportions at site s        ηs=random variable for model error at site s        H=the matrix containing the spectra of pure materials of interest        
The matrix, H, is presumed known and fixed, although for most actual materials there exist no single fixed spectral signatures to represent the pure materials.
The basic SMA may be modified to partition the H matrix into desired and undesired signatures. Subspace projections orthogonal or oblique to the undesired signatures and noise components are computed. Orthogonal subspace projection (OSP) is applied to hyperspectral imagery to suppress undesirable signatures and detect signatures of interest. This is shown in the relationship:H=[D,U]  (2)where:                D=matrix of known spectra for a target of interest        U=the matrix of undesired, but known, spectraThe matrix, U, may be unknown if D is a minor component of the scene. The above modifications are best suited to targeting applications rather than mapping.        
Unless reliable ground truth data are available, the task of determining the number of endmembers in a scene is nontrivial. Conventionally, a first step in working around the lack of ground truth data is to “noise whiten” the data and assess its dimensionality. One technique is the minimum noise fraction (MNF) transform. The MNF transform estimates a noise covariance matrix, Σn, typically accomplished using the minimum/maximum autocorrelation factors (MAF). An alternative to MAF, that is also easier to compute, is the noise adjusted principal component (NAPC) method. Other alternatives exist, some of which facilitate automation of the process. All, however, depend, on an accurate estimate of Σn, which may not be practical to attain.
In processing hyperspectral imagery of scenes with moderate to high complexity, either a large set of fundamental materials, termed endmembers, may exist throughout the scene, or some of the endmembers may have spectra that are similar to each other. Often these endmembers are mixed at the sub-pixel level, meaning that the signal measured at any specific point (pixel) in a scene are composed of spectra from more than one material. Because of the potentially large number of endmembers, and the possible spectral similarity of these fundamental materials, the use of one large set of endmembers for performing spectral un-mixing may cause unreliable estimates of material compositions at sites within the scene. A preferred embodiment of the present invention addresses this shortcoming.