The present invention relates generally to spectroscopy, and more specifically the invention pertains to an algorithm for rapidly estimating basis spectra (‘endmembers’) for use in analysis of hyperspectral data.
Hyperspectral data consists of hundreds of digital images, each spatially coincident image measured at a different wavelength. Each pixel in the image, then, has measured value at hundreds of wavelengths, and a spectrum of measured values vs. wavelength can be plotted for each pixel. This spectrum can also be thought of as a vector with magnitude and direction in come multi-dimensional space, with the perpendicular coordinate axes spanning this space being the wavelengths at which measurements were made. The measured spectra are then lists of coordinates for a point in this space.
When one makes a scatterplot in multiple dimensions with every pixel in the image plotted as such a point, the entire data set can be viewed as a ‘data cloud’—the scatterplot with thousands of points plotted from the data resembles a cloud. Points inside the data cloud can often be usefully modeled as a linear combination of points near the ‘hull’ of the data cloud. The physical interpretation of this model is that points on the hull of the data cloud may correspond to pixels that have uniform and unique spectral composition (uniform surface properties), and points inside the data cloud correspond to pixels with inhomogeneous composition describable as mixtures of the supposedly pure pixels of the hull.
An example of a system that acquires hyperspectral data is in U.S. Pat. No. 5,379,065, Jan. 3, 1995, Programmable hyperspectral image mapper with on-array processing, the disclosure of which is incorporated herein by reference.
An ideal case for hyperspectral data analysis is using the ‘convex hull’ approach would be data that, when scatterplotted, fell within (and even outlined) an obvious simplex. For two wavelength measurements, a two-dimensional simplex is a triangle. For three wavelength measurements, the corresponding simplex is a tetrahedron. Imagine, for the moment, the two-dimensional case. If the data cloud is of a triangle shape, then one may imagine trying to find a ‘best fit’ triangle to the data cloud. If this is done, the vertices of the data cloud could then be taken as points (representing spectra) that describe the rest of the data. That is, every point inside the triangle can be described as a linear combination of the triangle vertices, where the multipliers on the vertice vectors sum to one and are all greater or equal to zero. Imagine that the three vertices represent the spectra of a tree, a road, and soil. One would interpret the triangle shape of the data cloud, then, to mean that the measured scene was composed uniquely of there three things. The points near the triangle vertice can be taken to be purely tree, purely road, or purely grass. Points inside the triangle vertice are assumed to be a mixture of these three ‘endmembers’. Based on the position of data points inside the triangle one can calculate the exact nature of this mixture for each data point.
This calculation of multipliers for the endmembers is called ‘spectral unmixing’. For example, a point dead center in the triangle would be described as 1.3 tree, ⅓ road, and ⅓ grass (note the sum of the multipliers is one). A point along the ‘hull’ (the edge of the triangle) between the road and grass endmember could be ½ A grass, ½ A road, and 0 tree. The results of this unmixing are useful for an analyst trying to assign physical characteristics to each pixel in a hyperspectral image. Fir higher dimensions (often there are hundreds of wavelengths in a hyperspectral data set), one must imagine a data cloud inside a multi-dimensional simplex.
Often, the measured data may intrinsically be of lower dimensionality that the number of wavelengths. Imagine data being measured in the dimensions that falls along a plane when scatterplotted in three dimensions. The physical interpretation of this is that the measured scene contained only three unique spectral signatures and mixtures of these endmembers. It would then be possible to transform the data into a two-dimensional space and so the endmember identification and spectral unmixing in two dimensions (a simpler problem). In this manner, data measured in 200 wavelengths often is only intrinsically 10-15 dimensional
Data sets may exist where, physically, no pixels in the image were of a uniform composition. In this case, one may try to fit a simplex around the data cloud and use the extrapolated vertices guarantees a neat mathematical solution to the unmixing problem, but leaves room for error in having found physically relevant endmembers.
Fitting a simplex around a multi-dimensional data cloud in a physically meaningful way is a very difficult mathematical and computational problem. I am unaware of any existing software tools for doing this. Instead, a classic approach to finding endmembers is as follows. Firstly, one performs a principal component transformation on the data set to reduce the data to its intrinsic dimensionality. One then scatterplots the data along various axes, looking for obvious outliers that might be useful as endmembers. Having located several endmembers, one then does the linear spectral unmixing. This process is far from automated and can require hours to complete.
One attempt at automating part of the interactive hunt for endmembers is called “Pixel Purity Index’. This is included in the commercial software package called ‘ENVI’. This method iteratively creates random direction vectors, projects the data cloud onto these vectors, and flags pixels that lie at the extremes of the resulting distribution. After doing this thousands of time, it then selects the pixels that were most often flagged in this manner as possible endmembers. One then can interactively select pixels from this reduced number of pixels. This method is quite effective, but can require hours or even days of computing.