Several full-spectrum imaging techniques have been introduced in recent years that promise to provide rapid and comprehensive chemical characterization of complex samples. These spectroscopic imaging techniques include Electron Probe Microanalysis (EPMA), Scanning Electron Microscopy (SEM) with attached Energy Dispersive X-Ray Spectrometer (EDX), X-Ray Fluorescence (XRF), Electron Energy Loss spectroscopy (EELS), Particle Induced X-ray Emission (PIXE), Auger Electron Spectroscopy (AES), gamma-ray spectroscopy, Secondary Ion Mass Spectroscopy (SIMS), X-Ray Photoelectron Spectroscopy (XPS), Infrared Spectroscopy (IR), Raman Spectroscopy, Magnetic Resonance Imaging (MRI) scans, Computerized Axial Tomography (CAT) scans, IR reflectometry, Mass Spectrometry (MS), multidimensional chromatographic/spectroscopic techniques, hyperspectral remote imaging sensors, etc. These new spectroscopic imaging systems enable the collection of a complete spectrum at each point in a 1-, 2- or 3-dimensional spatial array. It is not uncommon that these spectral image data sets comprise tens of thousands of individual spectra, or more.
One of the remaining obstacles to adopting these techniques for routine use is the difficulty of reducing the vast quantities of raw spectral data to meaningful chemical information. Multivariate factor analysis techniques have proven effective for extracting the essential chemical information from high dimensional spectral image data sets into a limited number of components that describe the spectral characteristics and spatial distributions of the chemical species comprising the sample. In mathematical terms, given an m-pixel×n-channel matrix of spectral data D, we wish to approximate D by the matrix factorizationD≅CST  (1)Here, C is an m-pixel×p-component matrix describing the distribution of the pure components at each spatial location and S is an n-channel×p-component matrix representation of the pure-component spectra. In the typical case that p<<m and n, the factorization in Eq. (1) accomplishes a large reduction in the dimensionality of the data set with the goal of optimally separating factors embodying real chemical information from those describing only noise.
It is well known, however, that factor-based methods suffer from a “rotational ambiguity.” Given any invertible p×p transformation matrix R, D can be equally well expressed asD≅CST=(CR)(R−1ST)=CST  (2)That is, an infinite number of rotated factor pairs C and S will provide equally good fits to the data. The key to deriving relatively unique factors, then, is to select those factor solutions that satisfy additional optimization criteria. Thus, physically inspired constraints are often employed to derive relatively unique factor models that make the pure components more easily interpretable. In general, one would expect that the extent to which these criteria or constraints actually reflect the physical reality of a given sample, the higher the fidelity and reliability of the derived components.
Principal Component Analysis (PCA), used either by itself or to preprocess data, is the most ubiquitous tool of factor analysis. The constraints imposed by PCA are that the spectral and concentration factors must contain orthogonal components and that the components serially maximize the variance in the data that each accounts for. Neither constraint has any basis in physical reality; thus, the factors obtained via PCA are abstract and not easily interpreted. Alternating Least Squares-based Multivariate Curve Resolution (MCR-ALS) is another common factorization method used for spectral image analysis. This technique may force spectra and concentrations to be non-negative, for instance, yielding more physically realistic pure components. There are many cases, however, in which those constraints are not effective and where alternative approaches may provide new analytical insights.
For many cases of practical importance, imaged samples are “simple” in the sense that they consist of relatively discrete chemical phases. That is, at any given location, only one or a few of the chemical species comprising the entire sample have non-zero concentrations. In the limiting case that each location has only a single chemical component having non-zero concentration, the sample is said to be “perfectly simple.” The methods of the present invention exploit this simplicity to make the resulting factor models more realistic. Therefore, more physically accurate and interpretable spectral and abundance components can be extracted from spectral images that have spatially simple structure.
In the present invention, the methods for spectral image analysis by exploiting spatial simplicity, as described in application Ser. No. 11/233,223, are adapted to provide a method for the fast, robust, and automated multivariate statistical analysis of gas chromatography/mass spectroscopy (GC/MS) data sets. The method can involve systematic elimination of undesired, saturated peak masses to yield data that follow a linear, additive model. The cleaned data can then be subjected to a combination of PCA and orthogonal factor rotation followed by MCR-ALS refinement to yield highly interpretable results.