The present invention relates generally to the field of multivariate spectral analysis and, more particularly, to a method for augmenting a classical least squares calibration model to provide improved predictions of component values in unknown samples having unmodeled sources of spectral variation.
Over the past 20 years, quantitative multivariate spectral analysis has primarily shifted from the explicit classical least squares (CLS) method to the implicit principal component regression (PCR) and partial least squares (PLS) methods. The principle motivation for this shift is that CLS is based on an explicit linear additive model, e.g., the Beer-Lambert law. As such, CLS has the significant limitation that it requires the concentrations of all spectrally active components be known and included in the calibration model before an adequate prediction model can be developed. On the other hand, the PCR and PLS methods can achieve excellent predictions for multivariate spectral data sets where all of the spectrally active components have not been determined. Consequently, CLS has been relegated to solving a small set of well-defined linear problems with known spectrally active components that adhere to the Beer-Lambert law, e.g., infrared spectra of gas-phase samples.
Nevertheless, PCR and PLS do not have the qualitative capabilities of CLS since they do not generate explicit estimated pure-component spectra that can be readily interpreted. Also, they are not well suited to the advantages of a newly developed prediction-augmented CLS (PACLS) technique as set forth in U.S. Pat. No. 6,415,233, which is incorporated by reference herein. The PACLS algorithm provides a basis for rapidly updating a CLS model during prediction of component values of the target unknown sample. PACLS adds spectral shapes (i.e., spectral intensity information) to the CLS estimate of the pure-component spectra during prediction to account for spectrally active components or other spectral effects present in the prediction samples that were not modeled during calibration. PACLS allows CLS models to be updated for the presence of spectrometer drift, changes in spectrometer parts or changes in whole spectrometers, unmodeled chemical or non-chemical spectral components, as well as updating for more generalized changes such as changes in starting materials, the presence of nonlinearities, chromatic aberrations, or stray light, etc.
However, the PACLS algorithm is limited by the fact that accurate predictions require all interfering spectral components (including chemical and non-chemical sources of spectral variation) be explicitly included during calibration or prediction. If one or more spectral interferences were left out of the calibration, then their spectral influence would have to be explicitly added during prediction to correct for their absence in the calibration model.
These limitations of the CLS model can be reduced and even eliminated by the development of a new generalized family of algorithms, hereinafter referred to as augmented classical least squares (ACLS). The ACLS model uses information derived from component values and spectral residuals during the CLS calibration to provide an improved calibration-augmented CLS model. When the new ACLS methods are combined with the PACLS prediction algorithm, a powerful set of new multivariate capabilities is realized such that analyses can be performed with incomplete knowledge of interferences in the calibration and the prediction data.
The present invention further provides a generalization of ACLS methods for analyzing multivariate spectral data. Specific embodiments of the generalized ACLS methods are: spectral-residual augmented classical least squares (SRACLS), scores augmented classical least squares (SACLS), and concentration-residual augmented classical least squares (CRACLS) methods which all allow one to overcome the above deficiencies. The SRACLS, SACLS, and CRACLS methods are based on CLS so that they retain the qualitative benefits of CLS, yet they have the flexibility of PLS and other hybrid techniques in that they can define a prediction model even with unmodeled sources of spectral variation that are not explicitly included in the calibration model. The unmodeled sources of spectral variation may be unknown constituents, constituents with unknown concentrations, nonlinear responses, non-uniform and correlated errors, or other sources of spectral variation (e.g., temperature, spectrometer drift, etc.) that are present in the calibration sample spectra.
Augmentation can also be applied to constrained alternating classical least squares methods (alternating between CLS calibration and CLS prediction) that are used when the reference variables, such as the pure-component spectra or component concentrations, are inadequately known for standard CLS method. The ACLS methods of the present invention can improve the component identification with such inadequately known data sets.
Combining the present invention with the PACLS technique results in prediction models that are generally comparable or better in prediction ability to the standard PLS models. Also, since the various ACLS methods are based on CLS and unlike PLS, they can incorporate the PACLS feature of updating the prediction model for new sources of spectral variation without the need for time-consuming recalibration. These updated prediction models only require spectral information while PLS requires spectral and concentration information during recalibration. The present invention is not restricted to using continuous spectral information, but can also use any set of discontinuous spectral intensities that are selected in the calibration for the least squares analysis. Finally, the present invention generates better qualitative information about the analytes by generating better estimates of their pure-component spectra.
A method of multivariate spectral analysis is provided that is able to generate accurate and precise prediction models from multivariate spectral data which includes unmodeled spectrally active components present in the calibration model. The method provides the improved qualitative information of CLS methods as well as the quantitative prediction ability of the implicit multivariate calibration methods by augmenting the calibration model with a measure of the residual resulting from unmodeled components.
The present method of multivariate spectral analysis is most useful when an underlying calibration model includes unmodeled sources of spectral variation. In particular, reference variables (e.g., sample spectra and component values) are obtained for a set of calibration samples. An estimate is obtained for at least one reference variable for the set of calibration samples. Thereafter, a residual is obtained between the at least one reference variable and its corresponding estimated variable. A measure of the residual between the estimated and reference variable is used to augment its corresponding reference variable. Using the augmented reference variable, a value of at least one component in a set of unknown samples is predicted. Augmentation can be repeated until all sources of spectral variation are accounted for in the calibration samples. The iterative process allows the development of a CLS-type calibration model comparable in its quantitative prediction ability to implicit multivariate calibration methods, even when unmodeled spectrally active components are present in the calibration sample spectra.