The present invention relates to chromatography, and, more particularly to a system, program and method for estimating component spectra of a chromatogram with overlapping component peaks.
Analytic chemistry has provided scientists the ability to break down chemical systems into their constituents, the properties of which can then be investigated individually. Chromatography contributes to this ability by permitting scientists to determine the identity and relative concentrations of compounds in a mixture. The mixture itself can be the result of the breakdown of a highly complex molecular structure, such as a protein, so that chromatography can be used as a sub-procedure in the study of complex molecules.
Chromatography involves the flow of a mobile phase over a stationary phase. Each component of a mixture is distributed between these two phase according to a characteristic ratio. As the mobile phase moves past the stationary phase, repeated adsorption and desorption of a component occurs at a rate determined chiefly by its ratio of distribution between the two phases. To the extent that their distribution ratios are different, the components of the mixture move at different rates.
Where the distribution ratios are sufficiently different, the components of the mixture can be resolved into a series of bands. Spectral distributions can then be determined for the individual bands. A spectral distribution can be one of several types, generally corresponding to the specific chromatographic technique applied. For example, in a liquid chromatography system, in which a mobile liquid phase is passed through a stationary solid or liquid phase, a diode array detector can be used to determine the visible light or ultra-violet absorption spectra of the eluting components. Alternatively, gas chromatography systems, in which a mobile gas phase passes a stationary solid or liquid phase, can use Fourier transform infrared spectroscopy or mass spectroscopy to obtain chromatograms.
Where the mixture components are sufficiently resolved, the spectra measured as a chromatographic peak is eluting are those characteristic of a single component. However, with complex mixtures, there is some overlap of pure component spectra.
The spectra constituting an overlap can sometimes be deconvolved, i.e., mathematically estimated. Where deconvolution is possible, it is often more effective and efficient than successive chromatographic runs which might also be used, in effect, to resolve overlapping peaks.
The method of mathematical deconvolution derives from the work of Lawton and Sylvestre, Technometrics, Vol. 13, pp. 617-633 (1971). They showed that the spectra of mixtures of two compounds, when the spectra were normalized so that the sum of the elements in each spectrum is unity, can be represented by points in an abstract two-dimensional space that lie in a straight line. The type of normalization employed, herein referred to as "area normalization", has the effect of normalizing the spectra to unit area.
The preferred way to determine the coordinate vectors of this space was by principal component analysis of the spectra of the mixtures. Lawton and Sylvestre pointed out that estimates of the spectra of the pure compounds in the mixtures could be found from this line in the following way: if each real spectrum correspond to a point on the line, and also each point on the line corresponded to a spectrum, one could extend the line in each direction until points were reached where one or more elements of the corresponidng spectra were just less than zero. The point corresponding to the spectra of the pure compounds would lie somewhere between these end points and the nearest points corresponding to the measured spectrum of a mixture.
These methods were applied to chromatography to obtain the spectra of the components and resolve mathematically unresolved chromatographic peaks where only two components were present. See: Donald Macnaughton, Jr., L. B. Rogers, and Grant Wernimont, Analytical Chemistry, Vol. 44, pp 1421-1427 (1972); and Muhammed Abdallah Sharif and Bruce R. Kowalski, Analytical Chemistry, Vol. 54, pp. 1291-1296 (1982); Muhammed Addalah Sharif and Bruce R. Kowalski, Analytical Chemistry, Vol. 53, pp 518-522 (1981); and David W. Oston and Bruce R. Kowalski, Analytical Chemistry, Vol. 56, pp. 991-995 (1984). Borgen and Kowalski extended the method to three overlapped peaks. See Odd S. Borgen and Bruce R. Kowalski, Analytica Chimica Acta, Vol. 174, pp. 1-26 (1985). In this case the points corresponding to the spectra of the eluting mixtures lie on a plane in the space defined by the three principal components of the spectra of the mixtures. Here, the emphasis is on setting bounds within which the points that represent the spectra of the pure components must lie. No attempt is made to use the results of computing the concentrations to improve the estimates of the spectra.
Similar methods have been developed in which the normalization of the spectra is such the sum of the squares of the elements of each spectrum is unity. See: Jie-Hsung Chen and Lian-Pin Hwang, Analytical Chimica Acta, Vol. 133, pp, 271-281 (1981); and Bernard Vandginste, Raymond Essers, Theo Bormon, Joost Reijnen, and Gerrit Kateman, Analytical Chemistry, vol. 57, pp. 971-985 (1985). In methods using this form of normalization, referred to as "Euclidean normalization" herein, the points corresponding to spectra in the space of three principal components lie on the surface of a sphere. The polar and azimuthal angles of the points are then used as polar coordinates in a plane, to map the points from the spherical surface to a plane, with the polar angle used as the radius vector and the azimuthal angle as the vectorial angle.
In this polar representation, it is not possible to extrapolate linearly the loci of points representing mixtures of two compounds as is done in the method developed by Lawton et al. (op.cit). Instead, various constraints are used to define the locations of the points on the plane that correspond to the best estimates of the spectra of the pure compounds partially separated chromatographically. One of the constraints used is that no spectral element be negative, and at least one be zero.
Vandeginste et al. (op. cit) used the spectral estimates to compute elution profiles for the individual compounds. They were able to improve the spectral estimates by adjusting the spectra so that the amplitudes of any two components are zero at the peak of the elution profile of the third, where the peaks are found using the first estimates. Of course, this assumption may not be appropriate.
One can also regard the data array from the chromatographic detector not as a series of spectra, but as an array of elution profiles, each one measuring the response vs. time of the signal at a specific wavelength for LC, or a specific mass number of MS. The principal components of the elution profiles can then be found.
Vandeginste et al. (op cit) used this expansion in a way completely analogous to the method using principal components of the spectra. The application of their constraints to obtain estimates of the elution profiles of the compounds works better in some cases than in the analogous case of estimating their spectra. Spectral estimates can then be obtained from the data array of spectra vs. time, using the estimated elution profiles.
A method called iterative target transform factor analysis in which an estimate of the elution profile of each compound is expanded in terms of the elution profile principal components has been developed. See: Bernard G. M. Vandginste, Wilbert Derks, and Gerrit Kateman, Analytic Chemica Acta, Vol. 173, pp. 253- 264 (1985); and Paul J. Gemperline, J. Chem. Inf. Comput. Sci., Vol. 24, pp. 206-212 (1984). The elution profile so expanded may show negative amplitudes or secondary maxima. A new estimate of the elution profile is then made modifying the expansion to eliminate the presumably erroneous features, and the new estimate is expanded as before. When the estimate and its expansion are essentially identical, the iteration is terminated, and spectra corresponding to the elution profiles are computed by multicomponent analysis.
Harris and coworkers have developed yet another method for attacking the problem of mathematically deconvolving overlapped chromatographic bands as disclosed in U.S. Pat. No. 4,353,242. The peak shapes of the chromatographic bands are assumed to be known, and parameters of the bands such as mean position and peak width are computed from the array of data by a least squares fitting procedure. The spectra can then be computed from the data array and the elution profiles.
The foregoing and other references can be roughly summarized as follows. Deconvolution can be applied straightforwardly under each of the following conditions: (1) the spectral distributions of the components are known, for example, where only the relative concentrations of the components are unknown; and (2) at most two peaks overlap. More complex overlapping can be handled in a relatively straightforward manner by imposing certain assumptions on more complex spectral distributions. For example, one can assume that the unknown spectral distributions have a predetermined shape. Such an assumption can facilitate component estimation when correct. On the other hand, the imposition of strong assumptions can decrease the likelihood estimates obtained are valid, thus limiting confidence in the results of such methods.
Methods powerful enough to deconvolve up to three overlapping component peaks while imposing at most weak assumption on peak shape can be roughly categorized according to the type of normalization applied, i.e., there are area normalization methods and Euclidean normalization methods.
Area normalization methods permit a representation of chromatographic data in a plane in which binary mixtures of varying relative concentrations lie along a line segment defined by endpoints representing the pure component spectra of the pure compounds in the mixture. This permits straightforward linear extrapolation of pure component spectra as follows. Given a chromatographic elution sequence of compound A alone, A mixed with comppound B, B mixed with A and compound C, B mixed with C, and C alone, the pure component spectra for B corresponds to the intersection of straight line segments defined during binary mixture elutions A with B and B with C. This reliable method for extrapolating pure component spectra can serve as the basis for determining the concentration profiles for each eluting compound represented in a chromatogram.
There are two fundamental disadvantages to area normalization. The first is that it amplifies low signal regions of a chromatogram relative to large signal regions, thus amplifying noise relative to signal. The second is that area normalization permits division by zero and near-zero amounts when both positive and negative spectral components are involved, skewing any possible interpretation of results with large values of little validity.
While the second problem is not likely to be significant using raw chromatographic data, in which case all spectra data can be assumed postive, there are many situations in which negative values can be expected to occur. It is often advantageous to modify the spectra so that the sum of elements is zero, or very small. For example, one might correct for a constant offset of unknown amplitude of every element of a spectrum by requiring the average of the elements to be zero. Other commonly used modifications of spectra involve the use of the first or higher derivatives of the spectra instead of the spectra themselves. Since the sum of the elements of these modified spectra may be very small, these modifications are inconsistent with area normalization which forces the sum to be unity.
In Euclidean methods, it is the sum of the squares of the elements of each spectra, rather than the sum of the elements, that are set to unity. In comparison to area normalization, strong signal areas are emphasized over the relatively noisy weak signal areas. The terms of a sum of squares do not offset each other, so there is no special difficulty dealing with data which assumes both positive and negative values. Hence, Euclidean normalization is compatible with methods applying baseline correction or using derivatives of the spectra.
However, the polar coordinate reference frame generated using Euclidean normalization does not permit linear extrapolation of pure component spectra. Accordingly, the foregoing methods in the Euclidean category have had one or more of the following disadvantages: (1) a requirement for assumptions as to peak shape; (2) a difficulty in determining a confidence level or error bound for the result obtained; (3) inaccuracy of the results; and (4) severe computational requirements.
In addition, some Euclidean approaches impose assumptions of non-negativity to the chormatographic spectra. Such approaches are not compatible with certain baseline correction techniques that may be applied to the chromatogram before analysis, such as those yielding negative absorption cusps.
As indicated, available methods for deconvolving chromatograms in which up to three unknown spectral components overlap are limited in several ways, including accuracy and computational efficiency. Also, in most cases, no measure is provided for the errors in the estimates obtained.
Accordingly, it is an objective of the present invention to provide an improved system, program and method for deconvolving chromatograms with up to three unknown overlapping spectral components. The improvement subsists partly in an improved combination of computational efficiency and accuracy. In addition, an error bound on the estimates is provided.