1. Field of the Invention
The invention relates to multivariate analysis of spectral signals. More particularly, the invention relates to a method of multivariate analysis of: a spectral signal that allows for a wavelength or spectral region to be modeled with just enough factors to fully model the analytical signal without the incorporation into the model of noise by using excess factors.
2. Description of Related Art
Multivariate analysis is a well-established tool for extracting a spectroscopic signal, usually quite small, of a target analyte from a data matrix in the presence of noise, instrument variations, environmental effects and interfering components. Various methods and devices have been described that employ multivariate analysis to determine an analyte signal. For example, R. Barnes, J. Brasch, D. Purdy, W. Loughheed, Non-invasive determination of analyte concentration in body of mammals, U.S. Pat. No. 5,379,764 (Jan. 10, 1995) describe a method in which a subject is irradiated with NIR (near-IR) radiation, a resulting absorption spectrum is analyzed using multivariate techniques to obtain a value for analyte concentration.
J. Ivaldi, D. Tracy, R. Hoult, R. Spragg, Method and apparatus for comparing spectra U.S. Pat. No. 5,308,982 (May 3, 1994) describe a method and apparatus in which a matrix model is derived from the measured spectrum of an analyte and interferents. A spectrum is generated for an unknown sample. Multiple linear least squares regression is then utilized to fit the model to the sample spectrum and compute a concentration for the analyte in the sample spectrum.
L. Nygaard, T. Lapp, B. Arnvidarson, Method of determining urea in milk, U.S. Pat. No. 5,252,829 (Oct. 12, 1993) describe a method and apparatus for measuring the concentration of urea in a milk sample using an infrared attenuation measuring technique. Multivariate techniques are carried to determine spectral contributions of known components using partial least square algorithms, principal component regression, multiple linear regression or artificial neural network learning.
M. Robinson, K. Ward, R. Eaton, D. Haaland, Method of and apparatus for determining the similarity of a biological analyte from a model constructed from known biological fluids, U.S. Pat. No. 4,975,581 (Dec. 4, 1990) describe a method and apparatus for determining analyte concentration in a biological sample based on a comparison of infrared energy absorption between a known analyte concentration and a sample. The comparison is performed using partial least squares analysis or other multivariate techniques.
However, multivariate techniques such as principal component regression (PCR) and partial least squares regression (PLS) have some inherent disadvantages. One well-documented problem with multivariate analysis is that noise in the data may be incorporated into the model. This is especially true when too many factors are employed in the development of the model. The modeling of this noise results in subsequent prediction matrices with erroneously high error levels. See, for example, H. Martens, T. Naes, Multivariate Calibration John Wiley & Sons, 1989; or K. Beebe, B. Kowalski, An Introduction to Multivariate Calibration and Analysis Anal. Chem. 59, 1007A-1017A (1987). Complicating this issue is the fact that few factors may fully model a given spectral region, while additional factors may be required to model another set of wavelengths.
For example, a few factors may model a region having:                A. a high degree of co-linearity;        B. high signal to noise ratio;        C. minor or readily modeled instrument variations;        D. low contribution of environmental effect; or        E. a minimal number of readily modeled interfering signals.        
Other regions may require a much larger number of factors in order to sufficiently model the analytical signal. This may be the case when:                A. the data are not fully linear;        B. in low signal to noise regions;        C. instrument drift changes spectral response over time;        D. a large number of spectrally interfering components are present; or,        E. interfering signals are not readily modeled.        
In traditional applications of multivariate techniques such as PCR or PLS, a single number of factors is applied over an entire spectrum. This means that for a given region within the spectrum, selection of the appropriate number of spectral factors to adequately model the signal will result in all other spectral regions using the same number of factors. In many cases, another spectral region would be optimally modeled with a different number of factors than the first spectral region. Thus, a compromise between wavelength selection and the number of factors to incorporate into the model becomes necessary. There exists, therefore, a need in the art for an algorithm that allows the number of factors for each wavelength or spectral region to be chosen independently of the number of factors utilized to model a different wavelength or spectral region.