This disclosure relates to analysis of hyperspectral image data, and as described in greater detail below, relates in particular to creating dimensionality reduced hyperspectral image data, which may be based on an optimized set of basis vectors. While compression reduces the size of a data set, it typically results in a loss of access to information content. On the other hand, dimensionality reduction techniques provide compression with the ability to extract information from the data set in its reduced size. Thus, while all dimensionality reduction techniques provide compression, not all compression techniques allow for dimensionality reduction.
Hyperspectral sensors can collect image data across a multitude of spectral bands through a combination of technology associated with spectroscopy and remote imaging. Thus, such sensors can capture sufficient information to derive an approximation of the spectrum for each pixel in an image. In addition to having a color value, each pixel in the image additionally has a third dimension for a vector providing distinct information for the pixel over a large spectrum of wavelengths. This contiguous spectrum may be analyzed to separate and evaluate differing wavelengths, which may permit finer resolution and greater perception of information contained in the image. From such data, hyperspectral imaging systems may be able to characterize targets, materials, and changes to an image, providing a detection granularity which may exceed the actual resolution of pixels in the image and a change identification capability that does not require pixel level registration, which may provide benefits in a wide array of practical applications.
Because each pixel carries information over a wide spectrum of wavelengths, the size of a hyperspectral data set may often quickly become unwieldy in terms of the size of data that is being recorded by the hyperspectral sensor. As an example, hyperspectral sensors are often located remotely on satellites or aircraft capable of imaging areas in excess of 500 km×500 km per hour, which may result in the hyperspectral sensors generating anywhere from three to fifteen gigabits of data per second. Where the hyperspectral data needs to be processed in near real time, the large size of the data may introduce latency problems. In some cases, it may be desirable to transmit the data to a remote location for processing or other analysis, which again would make a reduced data size desirable. Additionally, it may be appreciated that large quantities of data may be difficult to analyze.
While lossy and/or lossless compression techniques may increase the transmission and processing rate for hyperspectral images, these techniques also suffer from various drawbacks. For example, while lossy compression methods may be fine for casual photographs or other human viewable images, wherein the data that is removed may be beyond the eye's ability to resolve, applying such lossy compression methods to a hyperspectral data set may remove information that is valuable and desired for further computer or mathematical processing. Such removal of data may undermine the ability to characterize targets, materials, or changes to scenes that are captured in hyperspectral images. Lossless data compression would not remove such valuable information, since lossless algorithms produce a new data set that can subsequently be decompressed to extract the original data set. Although general purpose lossless compression algorithms can theoretically be used on any type of data, existing lossless compression algorithms typically cannot achieve significant compression on a different type data than that which the algorithms were designed to compress. Thus, existing lossless compression algorithms do not provide a suitable guaranteed compression factor for hyperspectral images, and in certain cases, the decompressed data set may even be larger than the original data set.
Dimensionality reduction techniques strike a balance between the loss of data resulting from lossy compression, and the increased processing requirements of lossless techniques. For example, the dimensionality reduction techniques may identify information that is of particular importance, and segregate it such that it is not reduced, while reducing the remaining information that is of less value. Thus, the use of dimensionality reduction on hyperspectral data sets allows for transformation of the hyperspectral image into a more compact form, with little to no loss of the most relevant information. At the same time, it is advantageous for dimensionality reduction techniques to facilitate rapid processing of a reduced hyperspectral image data set. In the case of hyperspectral imaging data, this generally means that the dimensionality reduced data may be exploited for target detection, anomaly detection, material identification, classification mapping, or so on. Typically for dimensionality reduction of hyperspectral images, a family of functions or a set of vectors are found whose arithmetic combination can represent all of the data in a three-dimensional (3D) data set. Hyperspectral image data is generally discrete, so at each X/Y location in a hyperspectral image the spectral data may form elements of a vector. Depending on the nature of these vectors, they may either be characterized as endmembers or basis vectors. While basis vectors span the data obtained from the image, and form a mathematical basis for the data, endmembers are pixels from an imaged scene (or extrapolations of pixels in the scene), that represent the spectra of a pure material found in the scene. In some cases, endmembers are derived such that they enclose or bound the data set (as in a hypervolume or a simplex).
It may be appreciated that some dimensionality reduction techniques such as those disclosed in the related applications incorporated by reference above, may compute geometric basis vectors. Dimensionality reduction may alternatively be achieved through other hyperspectral image processing mechanisms, including but not limited to Principal Components Analysis, which computes “statistically derived’ basis vectors that span a scene in an optimal mean-square sense.
This disclosure additionally relates to clustering of pixels of hyperspectral image data. Clustering is a process which finds pixels that are more similar to each other than to other groups of pixels. In particular, such clustering may be utilized to determine pixels containing like materials and outliers. It may be appreciated that when a scene is being imaged, different materials and different contrast within the scene may form a generally heterogeneous data set. Accordingly, clustering may break up the scene into more homogeneous portions. Having identified homogenous regions of similar spectral properties, it may be easier to reject clutter. Additionally, such clusters may be utilized to create classification maps, which may be useful in characterizing the hyperspectral image. As one non-limiting example, a variety of clusters may be determined from the hyperspectral image data, including a cluster of pixels that represent grass, a cluster of pixels that represent water, a cluster of pixels that represent metal, and so on. It may be appreciated that clustering may also detect anomalies, such as determining those pixels that are outliers from the established clusters (or are identified as among the smallest clusters). Such clusters may further be useful in identifying pixels with similar spectral properties, which may be exploited in further analysis.
In a conventional implementation, clustering may include selecting an initial number of clusters followed by an initial assignment of every pixel to a cluster. In some cases, this assignment may be designated by a user. Tentative center coordinates for each cluster may be formed from the data or selected for each cluster. For example, the centers may be spaced uniformly to one another, or may be randomly distributed in the scene. Pixels may be assigned to the cluster whose center is nearest (e.g., the smallest distance to the center coordinates). The average coordinates of each cluster, including the added pixels, may then be computed to form the cluster center for the next iteration. In the next iteration, pixels would then be reassigned to the various clusters in an iterative process, with the iterations stopping when certain conditions are met. For example, in some implementations, a user-selected number of iterations may be performed. Alternatively, a stability condition may ultimately be reached, which signifies the end of the iterative process. For example, the cluster center may stop moving, or may move less than a certain distance. As another example, fewer than a certain percentage of pixels may change clusters from the past iteration. In still another example, each of the clusters may settle into a predetermined size or density range (e.g., the iterations end when clusters are not too small, too large, or insufficiently dense). Constraints may be made on the configuration of an allowable cluster, such as by splitting clusters that are too small (redistributing the pixels from the small clusters into other appropriate cluster), splitting clusters that are too large into smaller clusters, or so on.
It may therefore be appreciated that conventional clustering, typically performed as a discrete process on full dimensioned hyperspectral image data, is generally a highly iterative and slow process. For example, many iterations of the computations, such as distance comparisons between pixels and each cluster, may be required to establish the clusters of pixels. Accordingly, among other things, it is advantageous to increase the speed at which stable clusters are identified. Speed may be increased by reducing the number of iterations and reducing the computations associated with each iteration. However, it is important that pixels are assigned to the correct clusters at the end of the clustering processing.