1. Field of Invention
The present invention pertains to a method for concentration or property calibration of substances or matter and an arrangement for calibration of spectroscopic input data from samples, whereby concentration or property calibration determines a model for further samples from the same type.
2. Description of Related Art
Multiple measurement vectors and arrays are increasingly being used for the characterization of solid, semi-solid, fluid and vapor samples. Examples of methods giving such multiple measurements are Near Infrared Spectroscopy (NIR) and Nuclear Magnetic Resonance (NMR) spectroscopy. Frequently the objective with this characterization is to determine the value of one or several concentrations in the samples. Multivariate calibration is then used to develop a quantitative relation between the digitized spectra, a matrix X, and the concentrations, in a matrix Y, as reviewed by H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989. NIR and other spectroscopies are also increasingly used to infer other properties Y of samples than concentrations, e.g., the strength and viscosity of polymers, the thickness of a tablet coating, and the octane number of gasoline.
The first step of a multivariate calibration is often to pre-process input data. The reason is that spectra, as well as other multiple measurement arrays, often contain systematic variation that is unrelated to the response y or the responses Y. For solid samples this systematic variation is due to, among others, light scattering and differences in spectroscopic path length, and may often constitute the major part of the variation of the sample spectra.
Another reason for systematic but unwanted variation in the sample spectra may be that the analyte of interest absorbs only in small parts of the spectral region. The variation in X that is unrelated to Y may disturb the multivariate modeling and cause imprecise predictions for new samples and also affect the robustness of the model over time.
For the removal of undesirable systematic variation in the data, two types of pre-processing methods are commonly reported in the analytical chemistry literature, differentiation and signal correction. Popular approaches of signal correction include Savitzky-Golay smoothing by A. Savitzky and M. J. E. Golay, Anal. Chem. 65, 3279-3289 (1993), multiple signal correction (MSC) H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989 and P. Geladi, D. MacDougall, and H. Martens, Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat, Applied Spectroscopy, 3 (1985), 491-50, Fourier transformation by P. C. Williams and K. Norris, Near-Infrared Technology in Agricultural and Food Industries, American Cereal Association, St. Paul, Minn. (1987), principal components analysis (PCA) by J. Sun, Statistical Analysis of NIR data: Data pretreatment. J. Chemom. 11(1997) 525-532, variable selection H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989 and M. Baroni, S. Clementi, G. Cruciani, G. Constantino, and D. Riganelli. Predictive ability of regression models, Part 2: Selection of the best predictive PLS model. S. Chemom. 6 (1992) 347-56, and base line correction H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989 and R. J. Barnes, M. S. Dhanoa, and S. J. Lister. Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra. Appl. Spectrosc. 43 (1989) 772-777.
These signal corrections are different cases of filtering, where a signal (e.g., a NIR spectrum) is made to have “better properties” by passing it through a filter. The objectives of filtering, often are rather vague; it is not always easy to specify what is meant by “better properties”. Even, in the case of calibration, where it is possible to specify this objective in terms of lowered prediction errors or simpler calibration models, it is difficult to construct general filters that indeed improve these properties of the data.
Projections to latent structures by means of partial least squares (PLS) is one of the main generalized regression methods for analyzing multivariate data where a quantitative relationship between a descriptor matrix X and a quality matrix Y is wanted. Multivariate calibration, classification, discriminant analysis and pattern recognition are to name a few areas where PLS has shown to be a useful tool. The main reasons for its success are because it can cope with collinearity among variables, noise in both X and Y, moderate amounts of missing data in both X and Y, and it can also handle multiple Y simultaneously. These types of complicated data are now common due to the advent of analytical instruments such as HPLC, LC-UV, LC-MS, and spectroscopy instruments.
Improved and modified PLS methods using the so called NIPALS method, H. Wold, Nonlinear estimation by iterative least squares procedures in F David (Editor), Research Papers in Statistics, Wiley, N.Y., 1966 pp 411-444, have been suggested since the birth of PLS in 1977. A modification of the PLS method is presented. It aims at improving interpretation of PLS models, reduce model complexity, and improve predictions and robustness.
Spectroscopic methods represent a fairly cheap, quick and easy way of retrieving information about samples. In the characterization of organic substances such as wood, pulp, pharmaceutical tablets, ethanol content, etc., near infrared (NIR), NMR, and other instruments have proven useful.