Sensors are used to measure physical phenomena and to convert the measured values into data values. The magnitudes of these data values are presented as data signals. The measurement process itself will generally introduce errors and unwanted variations in the data values. In addition, additional errors and noise may be introduced in the conversion and transmission of data signals. In most cases, it is desired to extract from the data signals the data values associated with one or more "analytes", that is the magnitude of a specific physical phenomena. When a sequence of measurements is made, an additional dimension is added to the data signal structure, that is, an additional data point index is created relating the data values to the measurement sequence. For example, a single measurement, such as temperature, made at a sequence of times or on a series of specimens yields a stream of data values with one dimensional .Iadd.data .Iaddend.point index. Two dimensional data point index structures result when each measurement in a sequence produces a "spectrum" consisting of multiple interrelated data values, such as an optical spectrum or a chromatogram.
Noise and variation in such data structures are often assumed to be random and unrelated among the data values . With such random data, the most direct approach to reducing variation and noise is in a weighted averaging of such data values. This is typically done by averaging the spectral data values of several spectra from the sequence of spectra, for instance, in combining measurements having the same spectral data point index, or "wavelength". Another typical approach is the weighted averaging the data values of several adjacent data points within each spectrum. In many instances, both types of averaging are often combined, to produce more precise measurements of data, and to further reduce effects of any variations or random noise in the data. Averaging assumes that no .[.interrelationship.]. .Iadd.interrelationships .Iaddend.exist among the data values. If the data values are interrelated, there have also been developed a large number of multivariate methods of processing data signals to reduce noise and unwanted variations. In general, these methods are based on each data value containing several parts or components each related to a different physical phenomenon and, therefore, on each spectrum consisting of several "spectral components" characteristic of the associated physical phenomenon. Each spectral component consists of a set of data values and their associated data point indices. A spectral component may be represented by a data signal. These techniques have included, for example, such methods as least mean square methods of curve fitting various spectra. With this technique, each spectrum from the data set is approximated as a linear combination of the spectra of known constituents or components within the data. These approximations satisfy a least mean square criterion. The appropriate coefficients of these linear combinations are then linearly related with, typically, analyte concentrations in the specimens.
Curve fitting methods have been extended in various respects. First, multilinear correlation of several of the derived curve-fit coefficients with analyte values is accomplished to reduce the errors. Second, measured spectra from specimens of known compositions, rather than pure constituent spectra, are used as components. Errors are reduced using these techniques such that they become comparable to those obtained by multilinear regression using unmodified or derivative type data at selected wavelengths. Yet, such methods assume a previous knowledge of the reference spectra. As well, these methods are not applicable to situations where the interfering spectra have variable characteristics.
Other methods have included techniques such as spectral subtraction, where interferences are reduced by subtracting previously known or estimated reference spectra based on prior information about such spectra. For instance, in cases in which optical absorption spectra are used, it is known that such spectra will never be negative. Accordingly, once an absorption difference spectrum is estimated to be approximately .Iadd.zero .Iaddend.at one or more data points, it is no longer desirable to subtract a greater interference magnitude from this spectrum. At that point, the combined interference spectrum is set, and the estimate of the analyte spectrum is obtained.
Other higher level techniques have included such methods as latent variable analysis or bilinear modeling. In these methods, underlying sets of data values, or latent spectra, are extracted statistically from a data set. These .[.method.]. .Iadd.methods .Iaddend.include, among others, factor analyses, principal component analysis (PCA) and partial least square (PLS) methods. In these systems, a priori knowledge of the previously derived latent reference spectra are used as the spectral components throughout later analyses. In other words, once a latent reference spectral estimate is made, these higher level techniques inflexibly set the latent reference spectra.
All these previous techniques become inaccurate if the measuring instrument or conditions change significantly. In addition, these previous techniques are incapable of differentiating between the spectral component due to the mimicked analyte spectral component within the interference spectra. For instance, where interference spectral components of the data signal bear a correlation to the analyte spectral component, in many of the previous methods it is possible for the interference spectral component of the data signal to be confused with the analyte spectra, producing serious errors in the approximation techniques. Various methods have been proposed to correct these errors, but only after they have occurred.