The present invention pertains to a method for concentration or property calibration of substances or matter and an arrangement for calibration of spectroscopic input data from samples, whereby concentration or property calibration determines a model for further samples from the same type.
Near-infrared (NIR) spectroscopy is being increasingly used for the characterization of solid, semi-solid, fluid and vapor samples. Frequently the objective with this characterization is to determine the value of one or several concentrations in the samples. Multivariate calibration is then used to develop a quantitative relation between the digitized spectra, a matrix X, and the concentrations, in a matrix Y, as reviewed by H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989. NIR spectroscopy is also increasingly used to infer other properties Y of samples than concentrations, e.g., the strength and viscosity of polymers, the thickness of a tablet coating, and the octane number of gasoline.
The first step of a multivariate calibration based on NIR spectra is often to pre-process the data. The reason is that NIR spectra often contain systematic variation that is unrelated to the responses Y. For solid samples this systematic variation is due to, among others, light scattering and differences in spectroscopic path length, and may often constitute the major part of the variation of the sample spectra. Another reason for systematic but unwanted variation in the sample spectra may be that the analyte of interest absorbs only in small parts of the spectral region. The variation in X that is unrelated to Y may disturb the multivariate modelling and cause imprecise predictions for new samples.
For removal of undesirable systematic variation in the data, two types of pre-processing are commonly reported in the analytical chemistry literature, differentiation and signal correction. Popular approaches of signal correction include Savitzky-Golay smoothing by A. Savitzky and M. J. E. Golay, Anal. Chem. 65, 3279-3289 (1993), multiple signal correction (MSC) H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989 and P. Geladi, D. MacDougall, and H. Martens, Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat, Applied Spectroscopy, 3 (1985), 491-50, Fourier transformation by P. C. Williams and K. Norris, Near-Infrared Technology in Agricultural and Food Industries, American Cereal Association, St. Paul, Minn. (1987), principal components analysis (PCA) by J. Sun, Statistical Analysis of NIR data: Data pretreatment. J.Chemom. 11 (1997) 525-532, variable selection H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989 and M. Baroni, S. Clementi, G. Cruciani, G. Constantino, and D. Riganelli. Predictive abilityof regression models, Part 2: Selection of the best predictive PLS model. J. Chemom. 6 (1992) 347-56, and base line correction H. Martens and T. Naes, Multivariate Calibration. Wiley, N.Y., 1989 and R. J. Barnes, M. S. Dhanoa, and S. J. Lister. Standard Normal Variate Transformation and De-trending of Near-Infrared Diffuse Reflectance Spectra Appl.Spectrosc. 43 (1989) 772-777.
These signal corrections are different cases of filtering, where a signal (e.g., a NIR spectrum) is made to have xe2x80x9cbetter propertiesxe2x80x9d by passing it through a filter. The objectives of filtering often are rather vague; it is not always easy to specify what is meant by xe2x80x9cbetter propertiesxe2x80x9d. Even, in the case of calibration, where it is possible to specify this objective in terms of lowered prediction errors or simpler calibration models, it is difficult to construct general filters that indeed improve these properties of the data.
The present invention relates to a method and an arrangement that removes irrelevant parts from an input data set X of samples from substances or matter such as infrared and near-infrared spectroscopic input data, other spectroscopic input data, input data for predictions and generally for input data which can be categorized in sets or matrixes according to the present invention. This is achieved through ensuring that the removed part is orthogonal to Y, or as close to orthogonal as possible. The approach according to the present invention has been named OSC (Orthogonal Signal Correction).
Hence, the present invention provides a method for pretreatment (filtering) of input data from samples of substances or matter collected with the purpose of concentration or property calibration. A calibration determines a filter model for further samples of the same type, comprising the steps of:
optionally transform, center, and scale the input data to provide two start sets. arranging said input data in an input set;
determining a concentration or property set;
determining a score set and a loading set and their product, said product resembling the input set as much as possible under the constraint that the score set is orthogonal to the concentration or property set;
filtering said input data by subtracting said product from the input set in order to remove variations relating to properties other than present calibration properties; whereby
said model determines the filtering, thus providing that further samples, from the same type of samples, can be filtered with the filter model.
In one embodiment sets are arranged as two matrixes or one matrix and one concentration or property vector.
In another embodiment the bilinear filtering is combined with a linear filtering method. The another embodiment filtering method is one of wavelet-filtering or Fourier-filtering in a preferred embodiment.
In yet another embodiment the model is improved by applying multiple sets of input data as training sets and repeating said steps with a better concentration or property, thus tuning the filtering model.
Further provided by the present invention is an arrangement for concentration or property calibration of samples from spectroscopic input data. The concentration or property calibration determines a filter means for further samples of the same type, comprising:
transforming means, centering means, and scale means to optionally operate on the input data in order to provide two start sets.
arrangement means for arranging said input data in an input set;
determining means for determining a concentration or property calibration set, a score set and a loading set;
multiplication means for determining the product between the score set and the loading set, said product resembling the input set as much as possible under the constraint that the score set is orthogonal to the concentration or property set;
filter means for filtering said input data by subtracting said product from the input set in order to remove variations relating to properties other than present calibration properties; whereby
said model determines the filtering, thus providing that further samples of the same type can be filtered with the filter model.
In one embodiment of said arrangement, the sets are arranged as two matrixes or one matrix and one concentration or property vector.
In another embodiment said bilinear filtering is combined with filtering through another linear filtering means. Said another filtering means is one that, for example, provides wavelet-filtering or Fourier-filtering.
In yet another embodiment said filter model is improved by applying multiple sets of input data as training sets to said means, thus providing better determined properties by tuning the filtering model.