This invention relates generally to a method for determining physical or chemical properties of materials using infra-red analysis and more specifically to a method for improving the estimation of properties of interest in samples of materials based on non linear correlations to their infra-red spectra.
A particular use of the method is to obtain an improved estimation of octane number of gasolines by infra-red analysis.
Physical or chemical properties such as octane number, cetane number, and aromatics content can be usefully correlated to infrared spectra for appropriate sample sets. Linear techniques such as PLS, PCR, and extensions such as CPSA (Constrained Principal Spectra Analysis, J. M. Brown, U.S. Pat. No. 5,121,337) and the method of DiForggio, U.S. Pat. No. 5,397,899, provide workable correlations in many circumstances. The object of the correlations is to calibrate the infrared analyzer so that it can be employed to estimate the physical or chemical properties of future unknown samples on the basis of their infrared spectra. An important consideration in the implementation of these analyzers is their ability to statistically detect outlier samples, i.e. samples whose analysis represents an extrapolation of the predictive model.
For some applications, linear correlation techniques such as PLS, PCR and CPSA do not provide calibrations that predict physical or chemical properties with sufficient accuracy. Inaccurate calibrations can be an indication that the property being estimated depends in a nonlinear manner on sample composition. Various techniques have been suggested for addressing this problem including localized linear regression, MARS, and Neural Nets, but such techniques generally require large numbers of coefficients to be fit, and generally do not provide the statistical guidance available from linear techniques.
Calibration methods that are currently employed to correlate property or compositional data to spectral data are almost exclusively linear. Such methods assume a linear dependence of the property/component concentration on the spectral signal. Such linear methods are inadequate when the property depends on a nonlinear fashion on chemical components, or when interactions among components cause nonlinear spectral responses. While some nonlinear modeling methods have been explored in the literature, they generally involve attempts to define a nonlinear relationship between the spectral data and the property/component concentration. Such nonlinear methods generally require large numbers of coefficients to be determined. The large number of coefficients requires that very large sample sets be used in the calibration, and is prone to overfitting of the data. Also, most nonlinear methods fail to provide statistical means for determining when a new sample being analyzed is outside the range of the calibration, i.e. outlier detection. A simpler nonlinear method which is less prone to overfitting and which retains outlier detection was needed.
A variety of linear calibrations are in use in estimating property and component concentrations. For example, Hieftje, Honigs and Hirschfeld (U.S. Pat. No. 4,800,279) discussed linear methods for evaluation of physical properties of hydrocarbons. Lambert and Martens (EP 0 285 251) described a linear method for estimating octane numbers. Maggard discussed linear methods for estimating octane numbers (U.S. Pat. No. 4,963,745) and for estimating aromatics in hydrocarbons (U.S. Pat. No. 5,145,785). Brown (U.S. Pat. No. 5,121,337) discusses linear methods based on Constrained Principal Spectra Analysis (CPSA) and gives various examples.
Espinosa, et al. (EP 0 305 090 B1 and EP 0 304 232 A2) describe methods for direct determination of physical properties of hydrocarbons. Espinosa, et al. include linear terms (absorption at selected frequencies), quadratic terms (products between absorptions at different frequencies) and homographic terms (quotients between absorptions at different frequencies) in their equations. While the equations presented in their examples generally contain only a few nonlinear terms, these quadratic and homographic terms were chosen either arbitrarily or statistically from among a large number of possible nonlinear terms. For instance, for the 16 recommended frequencies in EP 0 305 090 B1, there are 18.sup.2 (324) possible quadratic terms, and another 18.times.17 (306) possible homographic terms which could have been used. For 16 frequencies, there are 646 coefficients which must be determined or set to zero in deriving the correlations equations. Even for simpler examples where only 6 frequencies were considered, 216 linear, quadratic, and homographic terms are possible, and 216 coefficients must be determined or set to zero in deriving the correlation equations.
Crawford, et al. (Process Control and Quality, 4 (1992) 13-20) predicted research octane number from near-infrared absorbance data with neural networks. Absorbances at 231 wavelengths were used as inputs to a neural network containing 24 nodes in one hidden layer. Including the node biases, a total of 24*231+24 (5568) coefficients (weights and biases) were determined in training the network.
Nonlinear multivariate calibration methods have been reviewed by Sekulic, et al. (Analytical Chemistry, 65 (1993) 835A-845A). Locally weighted regression (LWR), Projection Pursuit Regression (PPR), Alternating Condition Expectations (ACE), Multivariate Adaptive Splines (MARS), Neural Networks, nonlinear Principal Components Regression (NLPCR) and nonlinear Partial Least Squares (NLPLS) are discussed. All these techniques are much more computationally difficult than the nonlinear postprocessing method of the current invention.