Gas and liquid chromatography are widely used analytic techniques for separation and quantitation of mixtures of chemical compounds. In chromatographic analyses, a small sample of a mixture is introduced initially at the top of a column coated or packed with an adsorbent. "Top" as used herein is a relative term and means the end or region of a chromatographic column where the sample is initially introduced to the column. The adsorbent reversibly adsorbs components of the mixture. Thus, initially the sample is bound to the adsorbent at the top of the column. A carrier gas or liquid, referred to as an eluent, is passed through the column. As the eluent passes through the column, the components of the sample are displaced from the adsorbent by the eluent and then the components are adsorbed at another point on the column.
The various components of the mixture migrate through the column at different rates. The rate of migration of each component depends upon the affinity of the adsorbent for the component and the ability of the eluent to displace the component from the adsorbent as well as other factors known to those skilled in the art. Accordingly, different components of the sample migrate down the column at different speeds. Thus, as the carrier gas or liquid emerges from the column, the components in the mixture are swept out of the column with the carrier gas or liquid at various time intervals, i.e. retention times, after the introduction of the sample at the top of the column.
To measure the retention times of the various components in the sample, a detector is placed at about the exit of the column so that the eluate emerging from the column passes through the detector. Typically in liquid chromatography, the detector employs either ultraviolet adsorbance, refractive index or florescence as the measurement means. In gas chromatography, flame ionization or thermal conductivity are frequently employed as the detection principle.
Independent of the detector used for the chromatographic measurement, the detector generates an electrical signal which changes as a function of time in response to the concentration of the components of the sample passing through the detector. FIG. 1A illustrates the features of a typical chromatogram. The vertical line at the left hand side of FIG. 1A represents the introduction of the sample at the top of the chromatographic column and the initiation of eluent flow through the column. A first resolved peak 10 reaches a maximum at a time T1 while a second resolved peak 20 reaches a maximum at a time T2. Time T1 is the retention time for peak 10 while time T2 is the retention time for peak 20. Peaks 30, 40 are fused peaks.
A baseline signal level, which is usually defined as the signal level associated with only eluent flow through a chromatographic column, is represented by the signal level 50.sub.1 prior to peak 10, the signal level 50.sub.2 between peak 10 and peak 20, and so forth. The signal level for peak 10 consists of a signal level associated with a component of the sample passing through the detector and a baseline signal level associated with eluent flow through the detector. As described more completely below, a central problem in interpretation of a chromatogram is determining the contribution of the base line signal level to the measured signal level for a peak. As shown in FIG. 1A, the baseline signal level is relatively constant, but in many chromatograms, the baseline signal level is not constant and in fact changes with time. The phase "baseline" is often used to refer to the baseline signal level of a chromatogram.
Analysis of peaks 10, 20, 30, 40 (FIG. 1A) provides information about the identity and quantity of specific components of the sample, as well as performance indicators on the operation of the chromatographic column. Information typically generated for each peak, as described more completely below, includes peak characteristics such as retention time, peak area, peak height, peak width and skewness.
To ascertain the characteristics of resolved chromatographic peaks 10, 20 as illustrated in FIG. 1A, the detector signal is analyzed by measuring attributes of the signal such as the level, slope, or curvature of the detector signal as a function of time. The level of the detector signal means the actual signal from the detector. The slope of the detector signal means the first derivative with respect to time of the detector signal (first derivative) and the curvature means the second derivatives with respect to time of the detector signal (second derivative).
To characterize a resolved peak 10, 20 (FIG. 1A), when one or more of these parameters (the level, slope or curvature of the detection signal) exceeds a threshold level, a first baseline reference point, as described below, is usually established at a predetermined time prior to the time when the threshold level was exceeded. The threshold level is typically determined by examination of a chromatogram. (The threshold level is generally selected as the level corresponding to the baseline signal level, e.g., signal level 50.sub.1 (FIG. 1A).) After establishment of the first baseline reference point, the detector signal is continuously integrated until the parameter being monitored falls below the threshold level for a selected period of time. When the parameter falls below the threshold level for the selected period of time, a second baseline reference point is established. A straight line is generally interpolated between the first baseline reference point and the second baseline reference point.
FIG. 1B illustrates a common application of this method. In FIG. 1B, a single resolved peak 10 is shown with baseline signal levels 50.sub.1, 50.sub.2. A first threshold level is represented by dashed line 50.sub.5. At point a, which occurs at time t.sub.a, the detector signal level exceeds threshold level 50.sub.5. A first baseline reference point b is generally established at time t.sub.b, which is a predetermined time prior to time t.sub.a. The detector signal level is continuously integrated until time t.sub.e. Time t.sub.e is a selected period after the detector signal level falls below a second threshold level 50.sub.6 at point c. A second baseline reference point e is defined at time t.sub.e. In this example, the chromatogram of FIG. 1A was analyzed to establish the two different threshold levels used in the baseline correction.
Thus, the integrated signal, i.e., the total area under curve 10 between points t.sub.b and t.sub.e, includes the area corresponding to the baseline signal. To obtain only the peak area, the area corresponding to the baseline signal must be subtracted from the total area. The problem is to accurately estimate the area corresponding to the baseline signal. In one case, the peak area is found by subtracting the area of the trapezoid [the shaded area 10.sub.1 in FIG. 1B] defined by (t.sub.b, O), (t.sub.b, l.sub.b), (t.sub.e, l.sub.e), (t.sub.e,0) where l.sub.b and t.sub.e are the detection signals at time t.sub.b and t.sub.e respectively. Here the terms within parenthesis are x, y coordinates with the x coordinate being time and the y coordinate being detector output signal. Peak 10 after subtraction of area 10.sub.1 is illustrated in FIG. 2 as peak 10'. The peak characteristics, i.e., peak height, width, skewness and the time of maximal/minimal signal, i.e., the retention time, are determined using the baseline corrected data of FIG. 2 as described below.
Thus, peak integration is generally done by (i) detecting a point of departure from baseline, i.e., detecting the time at which one or more of the detector signal level, the slope or curvature exceed a threshold level; (ii) establishing a first baseline reference point at some predefined time prior to the point of departure from baseline; and (iii) continuously integrating the chromatographic signal from the first baseline reference point until the signal again drops below the threshold level from a predetermined time, i.e., the second baseline reference point.
Conventional chromatographic peak quantitation apparatus require that parameters be set which control analysis of the detector signal, e.g., the baseline-threshold level, and sometimes small changes in the control settings or detector input signals can cause very large changes in the peak quantitation determination. Thus, present methods for analysis of chromatographic data are unstable.
An alternative to the above method for correcting the measured data for the baseline signal level is to obtain a blank chromatogram (a chromatographic run with no sample injection). The blank chromatogram is subtracted from the chromatogram of interest before peak integration so as to obtain a baseline corrected chromatogram. In either approach, the trapezoid subtraction or the blank chromatogram subtraction, the determination of peak parameters is dependent upon the accuracy of the baseline correction.
As previously described, peak 10' of FIG. 2 is a baseline corrected representation of peak 10 of FIGS. 1A and 1B. A first and a second baseline reference points b, e were determined and baseline corrected peak 10' was defined as the peak above the straight line between first and second baseline reference points b, e. The parameters characterizing peak 10 are determined using baseline corrected peak 10'.
The peak width is determined by measuring leading peak half width A (FIG. 2) and trailing peak half width B and adding the two half widths. A measure of peak skewness is a ratio of the two half widths. The vertical height h of the peak is represented by the vertical line from the peak maximum to the baseline, i.e., the x axis in FIG. 2, of peak 10. In one measurement of asymmetry, i.e., peak skewness, the asymmetry is determined by drawing a line horizontal to the x-axis at 10% of vertical height h. The distance A from vertical line h to the left edge of the peak is the leading peak half width and distance B from vertical line h to the right edge of the peak is the trailing peak half width. Other measures of asymmetry measure the peak half widths at different fractions of vertical height h.
If a peak is gaussian in shape, distance A, the leading peak half width, equals distance B, the trailing peak half width. If a peak is gaussian, the column efficiency in terms of the number of theoretical plates N is easily determined. The general definition of column efficiency in units of theoretical plates is: ##EQU1## where T.sub.r is the retention time for the peak and o.sup.2 is the variance or the second central moment of the peak measured in time units. For a gaussian curve, ##EQU2## where A is the area of the gaussian peak and h is the peak height. Substituting Equation 2 into Equation 1 gives ##EQU3## Therefore, column efficiency as measured by theoretical plates can be determined from the vertical peak height h, the retention time and the area of a gaussian curve.
Non-Gaussian peaks (nonsymmetric peaks) in which distance A (FIG. 2) is not equal to distance B are typically encountered in chromatographic measurements, as discussed more completely below. However, the use of an equation such as equation (3), which is based upon a symmetric peak, to determine theoretical plate numbers for nonsymmetric peaks can result in serious errors.
Several researchers have used an exponentially modified gaussian (EMG) model for quantitation of chromatographic peaks. The development, characterization and theoretical and experimental justification of the exponentially modified gaussian (EMG) model has been discussed in several different references. See for example, R, E. Pauls and L. B. Rogers, Anal. Chem. 49, 628, 1977, or E. Grushka et al., Anal. Chem. 41, 889-892, 1969.
The exponentially modified gaussian function G(t) is a convolution of a gaussian function and an exponential decay function and is expressed as: ##EQU4## In Equation 4, A is an amplitude which corresponds to the peak height, t.sub.g is the time of maximum amplitude of the Gaussian function, .sigma. is a standard deviation of the gaussian function and .tau. is a time constant of the exponential decay function. The ratio .tau./.sigma. is a measure of peak asymmetry. As .tau./.sigma. increases, the chromatographic peak becomes more tailed. Conversely, as .tau./.sigma. approaches zero, the chromatographic peak approaches a gaussian peak. Thus, an EMG function can be used to describe both gaussian peaks and non-gaussian peaks. FIG. 3A illustrates an EMG peak. FIG. 3B illustrates the slope, first derivative with respect to time, of the MG peak and FIG. 3C illustrates the curvature, the second derivative with respect to time, of the EMG peak. The EMG peak in FIG. 3A is a positive resolved peak because the peak has a positive maximum. However, negative resolved peaks, i.e., peaks having a negative minimum, are also encountered in chromatograms.
Chromatographic peaks are often fused as shown in FIG. 1A. The analysis of fused peaks is much less straightforward and the results are dependent upon the means defined to separate the peaks as well as the baseline correction. In one method, (see, for example, Spectra Physics, "SP4270 Computing Integrator Operator's Manual, Section Seven--Principles of Integration," 1982, Spectra Physics Part No. A 0099-110), the areas are allocated by dropping a perpendicular line from the valley separating peaks to the interpolated baseline. In an alternative approach, the peaks are "skimmed" by taking baseline references at one or more valleys. The vertical drop method and skimming method are very inaccurate in their allocation of area between fused peaks. Errors in excess of 30% typically occur in area allocations for fused peaks using either of these methods. In these methods, the slope and/or the curvature of the detector signal have been used to identify the peak maximums and the valley between the maximum of two fused peaks.
Foley in "Systematic Errors in the Measurement of Peak Areas and Peak Height for Overlapping Peaks," J. of Chromatography, 384, 301-313 (1987) suggested methods of estimating EMG line shape parameters for fused peaks using retention time, front and rear half-widths, and amplitudes of baseline corrected peaks. The parameters used by these investigators, as illustrated in FIG. 4, are apparent peak heights h.sub.p,1 and h.sub.p,2, valley height, h.sub.v, and peak widths a,b. These parameters are defined after the peaks have been baseline corrected. Therefore, as described below, while the method of Foley is better than the vertical drop and skimming methods described above, the method is limited by the accuracy of the baseline determination.
Foley suggests that a logical approach for quantitation of fused peaks is to develop a quantitative method based upon measurements in regions of minimum distortion. He further suggests that for two fused peaks there is much less distortion for the first peak than the second, as evidenced by the generally insignificant errors in peak height for the first peak. Foley derived the following empirical equation for the area of a peak: EQU A=1.64 h.sub.p W.sub.0.75 (b/a).sup.0.717 ( 5)
where A is the peak area, h.sub.p is the peak height and W.sub.0.75 and (b/a) are the peak width and asymmetry measured at 75% of the peak height respectively.
Foley defines the relative valley as the ratio, expressed as a percentage, of the valley height h.sub.v to the apparent height h.sub.p of the peak in question. The investigator reported that the bias of Equation (5) is less than .+-.1.1% for well-resolved tailed (EMG) peaks having .tau./.sigma. in the range of 0.-4.2 and for well resolved symmetric (gaussian) peaks. For overlapping EMG peaks, or for an EMG peak overlapped by a gaussian peak with area ratios between 1:4 and 4:1, empirical equation (5) is reported to be accurate to .+-.4% for the first peak provided that the relative valley between peaks is less than 45%. For a symmetric peak overlapped by an EMG peak, empirical equation (5) is accurate to within .+-.2% for the second peak if the relative valley is less than 50%. For overlapping peaks with ratios outside the 1:4 and 4:1 range, equation (5) is described as being somewhat more accurate but only for the larger peak of the overlapped pair. Hence, not only is this method limited by the accuracy of the baseline correction, but also the method is limited to EMG peaks having specified relationships.
Equation (5) is used to quantitate only one peak of an overlapped pair. But if an integrator is used to measure the total area of the two overlapping peaks, the area of the remaining peak is determined by subtraction of the calculated area from the total measured area.
All of the prior art methods described above, including the work of Foley and Dorsey are dependent upon an accurate determination of the baseline. Inaccuracies in the baseline determination can cause significant errors in the derived properties for a peak. Since in practical chromatography a wandering baseline and large overlapping peaks atop a wandering and indeterminate baseline are frequently encountered, quantitation of either resolved or fused peaks using the prior art methods, described above, is often problematic. Baseline drift during a fused peak sequence causes sizeable errors and uncertainty in the determination of the prior position of the interpolated baseline. Moreover, a negative peak (a frequent occurrence in conductivity or refractive index detection) causes serious error in baseline determination particularly when coupled with a drifting baseline.
Hence, the methods described above are not suitable for analysis of chromatographic data without baseline resolution. In general terms, identification of individual chromatographic peaks in a chromatogram is a pattern recognition problem because the chromatogram typically consists of one or more peaks of a known shape superimposed on a baseline. Thus, a first problem is identifying each of the peaks from the pattern of resolved and fused peaks in the chromatogram. A second problem is determining a set of characterizing parameters for each identified peak. In other areas of science, methods have been developed for pattern recognition. For example, a neural net, sometimes called a neural network, has been taught to complete numerical sequences based upon training the net with other numerical sequences.
A neural net includes input units, internal units and output units. The input units are a first layer of the neural net. The internal units may be configured in one of more layers and the output units are the final layer in the net. Each of the input units supplies a signal to each of the internal units in the layer of the net adjacent to the inputs units. If the neural net has more than one layer of internal units, each unit in the first layer of internal units, i.e., the internal units receiving signals from the input units, generates an output signal that is provided to each internal unit in the second layer of internal units. Other layers of internal units are connected to adjacent layers of internal units in a similar manner. Each internal unit in the layer of internal units adjacent to the layer of output units provides a signal to each output unit. Each output unit provides an output signal.
For each neural net, the problem is to develop the internal units of the net so that for a given input pattern the net generates the appropriate output pattern. This requires that the net be trained to recognize the input patterns and generate the corresponding output patterns. For a specific example of configuring a neural net as an exclusive OR gate see D. E. Rumelhart, "Learning Internal Representations By Error Propagation," parallel distributive processing: "Explorations of the Micro Structures of Cognition, Volume 1," D. E. Rumelhart and J. L. McClelland (eds), Cambridge, M. A., MIT Press, pp. 3181.varies.362.
While pattern recognition methods, such as neural nets, are known, to the best of my knowledge such methods have not been used for either identification of chromatographic peaks in a chromatogram or characterization of identified chromatographic peaks. Accordingly, the prior art methods, as described above, for identification and characterization are limited by the requirement for a prior baseline determination which can bias the data. Moreover, the prior determination of a baseline for a fused peak sequence or a negative peak when the baseline is drifting is problematic. In addition to a prior baseline determination, the previously described prior art methods for analysis of fused peaks sequences require specific relationships between the peaks and an empirical relationship for evaluation of the peak area. Thus, a method and apparatus for analyzing a chromatogram without a prior baseline determination is needed to overcome the prior art limitations.