The present invention is directed to a method for improving performance of spectroscopic algorithms that are used to classify spectra, and more particularly to techniques to make spectroscopic algorithms more robust when analyzing data from unknown constituents.
Spectroscopy is a key technology for remote detection of biological or chemical constituents (such as biological and chemical warfare agents). The common thread in all spectroscopies is that each chemical and/or biological substance has a unique spectrum due to their unique structure. One of the goals of qualitative spectroscopy is to determine the component makeup of a substance given a library of the spectra of pure compounds. Quantitative analysis is not always necessary, and based on the sensor's construction and its operation, may not be possible. The use of spectroscopy requires algorithms that are capable of classification and de-convolution of spectra that arise from mixed substances. Regression methods are commonly used for qualitative data analysis. Multiple Linear Regression (MLR) methods are extremely useful for classification and de-convolution of mixed signals with a set of known library signals, called library spectra. Operationally, a library of spectra and a measured spectrum are input into the MLR model. The output is a vector called “mixing coefficients” that describes the quantities of the library spectra needed to linearly add the library spectra thereby generating a “best-fit” spectrum that is sufficiently close to the measured spectrum. Calculation of the mixing coefficients varies by model, and constraints may be employed. The advantages of MLR models for mixed signal identification include simplicity of implementation and operation, simultaneous determination of multiple compounds, speed of operation and the ability to use “pure” library spectrum (rather than a population of spectra to span the error space). In addition, most MLR models are based on rules that are consistent with the physics of spectroscopy in general. One particular advantage of many simple MLR models, including Classical Least Squares (CLS), is that no assumptions about the underlying probability densities of the signals need to be made or determined a priori. The importance of the contemporary algorithms cannot be overstated as these techniques are at the forefront of unmanned chemical and biological warfare detection.
These contemporary algorithms perform well against known compounds that are represented in the spectral library but are limited in their ability to handle unknown constituents that are not present in the library. Typically such unknowns will cause false alarms, as the algorithms attempt to use the library to describe the spectral features introduced by the unknowns. Historically, unknown spectral constituents are the Achilles heel of spectroscopic analysis. When performing spectroscopy in an uncontrolled setting (e.g., remote spectroscopic sensing of the environment) the assumption that the library contains everything that might generate a spectroscopic response is violated. At the onset, this puts conventional algorithms at a disadvantage, due to their inability to compensate for unknowns. Furthermore, many unknowns may share spectral similarity with any number of chemicals in the library, which further exacerbates the false alarm problem. For example, the functional group phosphate is responsible for a characteristic Raman peak in many chemical warfare agents such as Sarin, Soman, and Tabun. Similar chemical structure and therefore similar spectral features may be found in many of the pesticides sold in retail gardening stores. Unknown signals are ubiquitous and frequently degrade the sensor's performance even on well characterized signals. Thus, when unknowns are present, they tend to cause false positive detections. This introduces type II errors (accepting a false hypothesis).
Due to the almost infinite number of substances that may be encountered, it is impossible to include every possible constituent in the library spectra. This leaves the qualitative spectroscopist with three choices:
1. Ignore the unknowns and hope that they do not affect the analysis.
2. Control the sample rigorously—this may mean that samples are pre-treated to separate out anything besides the items of interest.
3. Build algorithms and routines that are robust against unknowns.
The first choice is the most common solution: make the a priori assumption that unknowns will not be present or if they are present, they will not cause significant problems. Although this greatly simplifies the problem of identification, for real world applications, those are dangerous assumptions to make. For these reasons, the second choice is often used in industrial settings, laboratory settings, and in environmental testing, where it is convenient to obtain a sample and perform the wet chemistry or preparative separation on it prior to (and sometimes in conjunction with) spectroscopic analysis. Pre-treatment is not always the most desirable choice, especially if the samples being analyzed are dangerous or if the samples are being sensed at such a distance, frequency, or under other circumstances that make pre-treating impossible. Thus, the better solution for performing real-time or in-the field measurements of un-treated samples is to make algorithms and routines robust to unknowns.
Attempts have been made to overcome these problems by either adding the unknown features into a calibration library, or subtracting them from the sample. All of these techniques involve analysis of quantitative data, and seek to correct both for unknowns and for disturbances in the spectrum due to disparate environmental effects. These methods require extensive knowledge of the system being measured, which is not available when performing remote analysis of environmental samples, in which the sensor may contain some variance, and the samples analyzed are unconstrained with respect to chemical composition. Another disadvantage for these competing attempts is that they require expert knowledge, and frequently expert operation, which hinders the ability of the algorithm to work unassisted, as a remote, real-time system would need to.
What is needed is a technique for automatically correcting spectroscopic analysis for unknown components present in the measured mixture.