1. Field of the Invention
The invention relates generally to an enhancement of estimation of an analyte property or concentration represented by a data matrix. In particular, the invention relates to a method and apparatus for enhanced estimation of an analyte property through multiple region transformation.
2. Discussion of the Prior Art
Preprocessing and multivariate analysis are well-established tools for extracting a spectroscopic signal, usually quite small, of a target analyte from a data matrix in the presence of noise, instrument variations, environmental effects, and interfering components. Various methods and devices are described that employ preprocessing and multivariate analysis to determine an analyte signal.
R. Barnes, J. Brasch, D. Purdy, W. Lougheed, Non-invasive determination of analyte concentration in body of mammals, U.S. Pat. No. 5,379,764, (Jan. 10, 1995) describe a method in which a subject is irradiated with near-infrared (NIR) radiation, resulting absorbance spectra are preprocessed, and the resulting spectra are analyzed using multivariate techniques to obtain a value for analyte concentration.
J. Ivaldi, D. Tracy, R. Hoult, R. Spragg, Method and apparatus for comparing spectra, U.S. Pat. No. 5,308,982, (May 3, 1994) describe a method and apparatus in which a matrix model is derived from the measured spectrum of an analyte and interferents. A spectrum is generated for an unknown sample. The spectrum is treated with first and second derivatives. Multiple linear least squares regression is then used to fit the model to the sample spectrum and compute a concentration for the analyte in the sample spectrum.
L. Nygaard, T. Lapp, B. Arnvidarson, Method of determining urea in milk, U.S. Pat. No. 5,252,829, (Oct. 12, 1993) describe a method and apparatus for measuring the concentration of urea in a milk sample using attenuated total reflectance spectroscopy. Preprocessing techniques are not taught or used. Calibration techniques, such as partial least squares, principal component regression, multiple linear regression, and artificial neural networks are used to determine spectral contributions of known components and relate them back to the urea concentration in milk.
M. Robinson, K. Ward, R. Eaton, D. Haaland, Method of and apparatus for determining the similarity of a biological analyte from a model constructed from known biological fluids, U.S. Pat. No. 4,975,581 (Dec. 4, 1990) describe an attenuated total reflectance method and apparatus for determining analyte concentration in a biological sample based on a comparison of infrared energy absorption between a set of samples with known analyte concentrations and a sample where the comparison is performed using a model.
Calibration development with techniques, such as multiple linear regression (MLR), principal component regression (PCR), partial least squares regression (PLS), and nonlinear calibration methods have some inherent disadvantages. One well-documented problem with multivariate analysis is that noise in the data creates error in the model. This is especially true when too many factors are employed in the development of the model. The modeling error results in subsequent prediction matrices with erroneously high error levels. See, for example, H. Martens, T. Naes, Multivariate Calibration John Wiley & Sons, p. 352 (1989); or K. Beebe, B. Kowalski, An Introduction to Multivariate Calibration and Analysis, Anal. Chem. 59, 1007A-1017A (1987). Complicating this issue is the fact that the initial factors of factor decomposition are dominated by a region of high variance to the detriment of analysis of a region with smaller variance.
For example, a few factors may model a region having:                A. a high degree of co-linearity;        B. a high signal to noise ratio;        C. minor or readily modeled instrument variations;        D. a relatively low contribution of environmental effects; or        E. a minimal number of readily modeled interfering signals.        
Other regions require a higher number of factors in order to sufficiently model the analytical signal. This is the case when:                A. the data are not fully linear;        B. a low signal to noise region is analyzed;        C. instrument drift changes the spectral response over time; or        D. a large number of spectrally interfering components are present.        
Finally, due to low signal to noise, a particular region may provide limited utility for model development.
In traditional chemometric analysis, a single preprocessing routine is applied over an entire axis of a data matrix. For example, an entire spectral region is selected for a single preprocessing routine. This means that a single preprocessing routine is selected despite the region to region variation spectral state, such as signal, noise, and resolution. Thus, for a given region within a spectrum, selection of the appropriate preprocessing routine to adequately enhance the signal-to-noise ratio results in all other spectral regions using the same preprocessing routine. In many cases, another spectral region is optimally enhanced with different preprocessing. That is, one preprocessing routine is not optimal for multiple regions of a spectrum where the underlying signal and noise structures are different in different spectral regions. This is based on the fact that the signal is non-uniform with respect to spectral region and the noise is typically heteroscedastic with wavelength. Thus, a compromise in the single preprocessing routine for different spectral regions becomes necessary and results in a sub-optimal extraction of the signal. There exists, therefore, a need in the art for a preprocessing system with separate routines for each region of an axis of a data matrix, such as a wavelength or spectral region, to enhance independently the combination of spectral regions analyzed, and to enhance fully the signal to noise ratio of each wavelength region or spectral region.
Similarly, in traditional calibration development using multivariate techniques, such as PCR or PLS, a single number of factors is applied over an axis of a data matrix. For example, an entire spectral region is selected for a single number of factors. This means that for a given region within the spectrum, selection of the appropriate number of spectral factors to model the signal adequately results in all other spectral regions using the same number of factors. In many cases, another spectral region is optimally modeled with a number of factors that is different than the optimal number of factors used for the first spectral region. Thus, a compromise between wavelength selection and the number of factors to incorporate into the model becomes necessary. There exists, therefore, a need in the art for a routine that allows the number of factors for each region of an axis, such as a wavelength or spectral region, to be chosen independently of the number of factors used to model a different wavelength or spectral region.
K. Hazen, S. Thennadil, T. Ruchti, Combinative multivariate calibration that enhances prediction ability through removal of over-modeled regions, PCT patent application no. PCT/US01/21703, (Jul. 9, 2001) describe a calibration where different spectral regions are analyzed with a differing number of factors. Selection of an appropriate number of factors for each spectral region allows removal of noisy regions before inclusion into the calibration model.
An enhancement of estimation of an analyte property is presented that allows for optimization of decomposition of a given spectral range independently of other separate or overlapping spectral ranges. An additional preprocessing step is optional, and is performed prior to decomposition, after decomposition, or is performed both prior to and after decomposition. A combination step concatenates the scores matrices of the individual decompositions to arrive at a composite scores matrix for subsequent calibration development. The developed calibration is subsequently applied to a new matrix, such as a spectrum, to perform an estimation of a target analyte concentration or property.