Often it is desirable to perform an analysis of a sample containing multiple spectrally resolvable species wherein the relative concentrations of the component species are to be determined. Such simultaneous detection of multiple species in a single sample mixture has a number of advantages over serial analysis of multiple sample mixtures each containing only a single species. First, because only a single sample mixture is analyzed, fewer steps are required for sample processing and only a single measurement is required, both features resulting in a higher sample throughput and improved convenience to the user. Moreover, by combining multiple species into a single mixture, internal calibration is facilitated. An important example of a process utilizing such simultaneous multispecies spectral detection is multicolor DNA sequencing where four spectrally resolvable fluorescent dyes are simultaneously detected (Smith; Connell; Hunkapiller).
Because it is difficult to find a collection of reporters whose spectral response do not at least partially overlap, a problem common to all such simultaneous measurements is the determination of the concentration of each of the individual species given data that contains spectral contributions from multiple species. That is, to determine the individual species concentrations, the measured spectral data must be deconvolved. For example, FIG. 1 shows the emission spectra of four dyes used in four color DNA sequencing. It is clear from these spectra that it is impossible to identify a set of detection wavelengths that will result in both spectrally pure signal and sufficient emission intensity.
Linear multicomponent analysis is a powerful deconvolution method useful for the determination of the concentration of individual species given spectral data that contains contributions from multiple spectrally overlapping species (Frans; Kalivas; Thomas). In linear multicomponent analysis, a series of linear equations of the form EQU d=Kc+r
are solved, where d is a vector whose elements correspond to a spectral response measured at a particular wavelength, K is a calibration matrix whose elements correspond to pure component linear response constants for each species at each channel, c is a concentration estimate vector whose elements correspond to an estimate of the concentration of a particular species in a mixture, and r is a residual vector of the concentration estimate vector c. (Note that throughout this disclosure, matrices are designated by boldface capital letters and vectors are designated by boldface lowercase letters.) Thus, given measured values d and calibration matrix K, estimated values for the individual concentration of each species c can be determined. The above equation is written in a form which assumes that vectors d and r are column vectors. If vectors d and r are expressed as row vectors, the preceding equation becomes, EQU d=K.sup.T c.sup.T +r
Throughout this disclosure, it will be assumed that vectors d and r column vectors.
In addition to obtaining an estimated value for the concentration of each species in a mixture, it is desirable to obtain a quantitative figure of merit for the quality of the estimate. For example, in the case of multicolor DNA sequencing methods, it is useful to have a measure of the quality of a particular base call in a sequence. This is particularly true for the case of sequences including heterozygote positions where two bases may be present at a given position in the sequence. The most common such figure of merit used in linear multicomponent analysis is the condition number of the calibration matrix, cond(K), where EQU cond(K)=.vertline..vertline.K.vertline..vertline..multidot..vertline..vertl ine.K.sup.-1 .vertline..vertline.
where the double brackets indicate the norm of the matrix (Otto).
However, the condition number is not an optimal figure of merit for indicating the quality of a multicomponent concentration estimate. The condition number is a measure of the quality of a calibration matrix K rather than a measure of the quality of a particular multicomponent measurement. Thus, it is possible for a system to have a favorable condition number, but, because of certain experimental factors, result in a particularly uncertain measurement. For example, the condition number provides no guidance as to how a noisy signal impacts the quality of a particular multicomponent concentration estimate.
Thus, there is a need for a statistically meaningful figure of merit for determining the quality of a concentration estimate based on the spectral response of a mixture containing a plurality of spectrally distinguishable species using multicomponent analysis methods.