The polymerase chain reaction (PCR) is an in vitro method for enzymatically synthesizing or amplifying defined nucleic acid sequences. The reaction typically uses two oligonucleotide primers that hybridize to opposite strands of a DNA molecule and flank a template or target DNA sequence that is to be amplified. Elongation of the primers is catalyzed by a heat-stable DNA polymerase. A repetitive series of cycles involving template denaturation, primer annealing, and extension of the annealed primers by the polymerase results in an exponential accumulation of a specific DNA fragment. Fluorescent probes or markers are typically used in real-time PCR, or kinetic PCR, to facilitate detection and quantification of the amplification process.
Referring to FIG. 1, the steps of a cycle in the real-time polymerase chain reaction (PCR) are described. There are three steps in a PCR cycle. In step 1 of the cycle, the temperature is raised to about 95 degrees, and then the target DNA that is to be amplified gets denatured into two separate strands at that point. In step 2 of the cycle, the temperature is lowered to about 50 degrees at which point primers are annealed to the single strands. The primers are short oligomers that specifically attach to each of the two denatured DNA strands. In step 3 of the cycle, the temperature is raised to 72 degrees and a polymerase enzyme extends the separated strands so that, after each cycle, where for each double-stranded DNA, two copies of the double-stranded DNA are produced. These steps are repeated in each cycle in order to amplify target nucleic acid. For instance, the aforementioned steps of the PCR cycle may be repeated forty or more times.
Referring to FIG. 2, during real-time PCR, a fluorescent signal is measured during the cycling. This fluorescent signal is a measure of the number of double-stranded DNA that has been produced. Curve 202 is the measurement of the fluorescent signal after each cycle on an absolute scale and curve 204 is the measurement of the fluorescent signal after each cycle on a logarithmic scale. From curve 202, is can be seen that there are three distinct phases during real-time PCR. The first phase is the exponential phases. In the particular case illustrated in FIG. 2, the exponential phase roughly consists of cycles 1-26. The second phase is the linear phase. During the linear phase, the fluorescent signal grows linearly as a function of PCR cycle then the signal starts leveling off. In the particular case illustrated in FIG. 2, the linear phase consists of PCR cycles 27-31. The cycles where the fluorescent signal begins leveling off is referred to as the plateau phase. In the particular case illustrated in FIG. 2, the plateau phase consists of PCR cycles 32-40.
Part of the exponential phase is not visible because the fluorescent signal is less than the background. This can be seen in FIG. 2, where the fluorescent signal is below background noise for PCR cycles 1 through 20. So, for the first PCR cycles, no detectable fluorescent signal is read. The fluorescent signal is not observed until it becomes strong enough to measure it. In the particular case illustrated in FIG. 1, the fluorescent signal is strong enough to measure after the 21st PCR cycle. Often the fluorescent signal cannot be detected for the first several PCR cycles because there are typically as few as 20 to 100 copies of the original target double stranded nucleic acid that is to be amplified. Thus, there is essentially nothing to measure during the initial stages of the real time PCR because the strength of fluorescent signal is proportional to the number of copies of double stranded nucleic acid that are in the reaction mixture.
Assuming one hundred percent efficiency, each PCR cycle will double the number of double stranded target nucleic acids in the reaction mixture. Therefore, assuming that the PCR is one hundred percent efficient, the total number of templates (double stranded target nucleic acid) in each cycle doubles so that, after n-cycles, the number templates Nn in the reaction mixture will be No, where No is the number of copies of the template in the initial reaction mixture before real-time PCR amplification was initiated, times two to the power of n:Nn=No*2n  (1)If the efficiency is other than one hundred percent, then the number of templates Nn in the reaction mixture after n-cycles will be:Nn=No*(1+E)n  (2)where E is the efficiency. In practice, real-time PCR is not one hundred percent efficient and thus E is a real value less than one. Taking the log of equation (2) yields:Log(Nn)=Log(No)+n Log(1+E)n  (3)Where Log(Nn), the logarithm of the number of copies of the template in the reaction mixture in the nth cycle, is the summation of logarithm of the initial number of templates No and n times the log of the efficiency plus 1 to the nth power. Referring to FIG. 3, the advantage of expressing Nn in the logarithmic form of Equation 3 is that a threshold line T can be established. The threshold line T is set at a value above background noise. Placement of the threshold line T can be varied, but it is placed somewhere in the exponential phase of the curve that is above background noise. Once the threshold line T has been placed, the goal is to accurately determine the number CT of real-time PCR cycles that are required to achieve a fluorescent signal that is equal to or greater than T. From Equation (3), CT will be:CT=−Log(No)/Log(1+E)+Log(T)/Log(1+E)  (4)From FIG. 3, it can be seen that the logarithm of the fluorescent signal has a negative value for the initial number of copies. The threshold value T can be varied, but from Equation 4, it can be seen that CT should depend as minus log No.
Referring to FIG. 4, in quantitative PCR, several amplification curves are computed. This can be done in the linear scale (402) or the log-scale (404). In FIG. 4, the amplification curves are for the target nucleic acid PBEF1. Each of these amplification curves represents a serial dilution of an initial sample of PBEF1. Furthermore, each of these serial dilutions may be performed in triplicate. In other words, the initial sample is divided into three samples, where the number of the target nucleic acid is the same in each of the samples. Real-time PCR is performed on each of the three samples. Each of the three samples is then serial diluted and real-time PCR is performed on the serially diluted samples. The goal of all this work and the computation of all the amplification curves are to find out how many copies of the template nucleic acid (No) were initially in the sample.
The premise behind serial dilutions is that, when No is the number of target nucleic acid in the original sample, and the sample is repeatedly diluted by a factor of two, No should likewise by reduced by a factor of two. Thus, according to Equation (4), when CT is calculated for each of the serial dilutions, CT should be linear function of Log(NO) with the slope of −1/Log(1+E). Computation of CT as a function of Log(No) is called a standard curve. Such a standard curve is illustrated for the amplification curves of PBEF1 in FIG. 5. The value Log(No) is taken on the illustrated y-axis and CT is taken on the X-axis. Furthermore, as illustrated in FIG. 5, the standard curve is linear.
Because the standard curve is linear, it is possible to compute the coefficient of determination, R2, or an adjusted R2 and the efficiency E. An adjusted R2 is defined as:
                    1        -                              (                          1              -                              R                2                                      )                    ⁢                                    n              -              1                                      n              -              p              -              1                                                          (        5        )            where p is the total number of regressors in the model and n is sample size. For the data in FIG. 4 and plotted as a standard curve in FIG. 5, the adjusted R2 is 0.999011. Thus, FIG. 5 shows that with smaller N0s, it takes fewer cycles to achieve CT and as N0 is increased a logarithm, linearly CT becomes greater. Thus, from the standard curve, the value CT for the initial undiluted sample N0 can be computed from the slope. From Equation 4,CT=−Log(N0)/Log(1+E)+Log(T)/Log(1+E)  (6)=−3.4 Log(N0)+26.081  (7)For the data in FIG. 4 and plotted as a standard curve in FIG. 5, the efficiency E is 96.8%, which is nearly a perfect 100%.
Equation 4, and the computations set forth in Equations 6 and 7 which are examples of computations of Equation 4, have a number of underlying assumptions. One assumption is that, for all the cycles for which the fluorescent signal is larger than background, the fluorescent signal will be proportional to the number of copies of the target nucleic acid (and its amplified copies). This assumption is not problematic. Furthermore, Equation 4 assumes that each of the cycles leading up to CT is in the exponential phase. T cannot be set in the linear or plateau phase because the assumptions underlying Equation 4 do not work in those phases. Again, this assumption is not problematic because T can be set in the exponential phase. Furthermore, automated software in many PCR machines can find a reasonable value for T on an automated basis, or the user can set T somewhere in the exponential phase.
One problematic assumption behind Equation 4 is that it assumes that efficiency E in each of the cycles leading up to CT are the same, including cycles where the fluorescent signal simply is not measurable because it is still less than the background. However, it is not possible to verify this assumption when no measurable signal is obtain from the first several PCR cycles. Moreover, another problematic assumption is that Equation 4 assumes that the efficiency will be the same for all the response curves (e.g., the response curves illustrated in FIG. 4), independent of the initial template concentration and individual reactions conditions. So, Equation 4 assumes that, for the entire set of response curve experiments, E is the same. The assumption that E is the same for each cycle before CT in a given PCR experiment and that E is constant for each of the PCR experiments done in a serial dilution presents a significant problem because there is no guarantee that these assumptions are, in fact valid. For example, consider the case in which there are two different samples each with a different target nucleic and the goal is to quantifiably compare the initial concentration of the target nucleic acids in the two samples. From the above analysis, a comparison of the initial concentration of the two target nucleic acids may be problematic using conventional techniques because there is no guarantee that E will be the same in the PCR experiments used to generate the amplification curves for the two samples. This point can also be seen below, in conjunction with FIG. 7 where it is shown that the efficiency of the PCR reactions for the IFNAR1 amplification curves is substantially below the efficiency of the efficiency of the PCR reactions for the PBEF1 amplification curves of FIG. 4 (E=83.5% for IFNAR1, as compared to E=96.8% for PBEF1).
Referring to FIG. 6, the efficiency of a given PCR experiment can be checked. By definition, the efficiency of any given cycle n in a PCR experiment En (termed local efficiency, as opposed to the assumed constant efficiency of the overall PCR experiment throughout the plurality of cycles of the PCR experiment) is (i) the number of copies of the target nucleic acid in the current cycle Nn divided by (ii) the number of copies of the target nucleic acid in the previous cycle Nn-1 minus 1:En=Nn/Nn-1−1  (8)FIG. 6, which plots local efficiency En for each of the cycles of a real-time PCR experiment in the PBEF1 amplification curves, shows that En varies all over the place. There is no requirement that En be set to one hundred percent for calculations based upon Equation 4, but such calculations do assume that En is constant. As can be seen in FIG. 6, the value of En is not known in the initial cycles, because there is no measurable fluorescent signal. And, for those cycles in the exponential phase where a measurable fluorescent signal is seen, En falls off. As seen in FIG. 6, when a measurable fluorescent signal is detected near cycle 20, the efficiency is somewhere around 1. Then it falls off, all the way down to zero. Furthermore, there is no guarantee that the efficiency for cycles less than 20 are constant either. From what is observable, the efficiency is not constant.
FIG. 6 illustrates that the efficiency of the PCR reaction is not anywhere near 97%. But, the standard curves for the same sample, as illustrated in FIGS. 4 and 5, indicate that the efficiency of the PCR reactions is, in fact 97%. Thus, there is a discrepancy between the efficiency E computed by the standard curve of FIG. 5 and the local efficiency computed in FIG. 6.
Moreover, as indicated above, the standard curve illustrated in FIG. 5 has no mechanism for compensating for variations in efficiency that will arise when different reagents or different reaction conditions are used in the various PCR reactions used to generate the data for a standard curve.
Referring to FIG. 7, amplification curves for the target nucleic acid IFNAR1 are illustrated as well as the standard curve from these amplification curves. From the standard curve:CT=−Log(N0)/Log(1+E)+Log(T)/Log(1+E)  (9)=−3.795 Log(N0)+30.955  (10)Furthermore, E for the IFNAR1 data is 83.5 percent. Referring to FIG. 8, which plots local efficiency En for each of the cycles of a real-time PCR experiment in the IFNAR1 amplification curves, shows that En varies all over the place.
Referring to FIG. 9 the determination of how adjusted R2 and efficiency vary for the PBEF1 and IFNAR1 datasets discussed above as a function of threshold T placement. Referring to FIG. 8, the data for FIG. 9 is, for example, computed by setting threshold T line 802 to one value, calculating the CT, moving T somewhere else, and recalculating CT and so forth. If the threshold T (line 802) is moved too high, then the data no longer falls completely in the exponential region, particularly on the log scale. In the log scale this exponential phase should be linear. If line 802 (T) is moved above where it starts falling off, then you're already not in the exponential phase of the Equation 4. So it is expected that if the threshold is moved high enough, than the assumptions underlying Equation 4 will no longer be valid and efficiency will go down. As illustrated in FIG. 9, the higher threshold T is set (in terms of fluorescent signal strength) the more efficiency starts falling off. Thus, not only is efficiency dependent on starting reagent (PBEF1 versus IFNAR1), it is also dependent on the choice for the value of T.
Ideally, there should be a low enough T, such that calculated efficiency is 98% or a value close to 98%. And, when the threshold is increased, the efficiency should go down because regions of the PCR experiment where the efficiency is going down are being incorporated (see for example the local efficiencies computed for PBEF1 and IFNAR1 in FIGS. 6 and 8). But as illustrated in FIG. 9, this is not always the case. As illustrated in FIG. 9, IFNAR1, does what it's supposed to do. The efficiency begins on the order of 0.85. Then, at the threshold T is set higher, the efficiency falls down. But in the case of PBEF1, the efficiency begins at 0.98 and, when T is increased thereby including less efficient cycles into the computation, the efficiency does not go down, but, as illustrated in FIG. 9, actually increases.
The above analysis indicates that the concept of the efficiency is problematic. Thus, methods that rely on efficiency in order to compare the starting concentration of target nucleic acids in one sample to the starting concentration of a target nucleic acid in another sample are problematic.
The standard curve can be used to calculate, to predict, the starting value for N0. That is the goal of quantitative PCR: calculation of N0 from the CTs, based upon the standard curve. So in principle, the standard curve is computed thereby giving the relation between CT and N0. This relation can be used in further experiments to predict the specific value for N0. So, from the same data used to calculate the standard curve, one can try to predict the specific value for N0 to check how well the method works for the same set of data. Referring to FIG. 10, absolute error AE in the prediction of N0 using the standard curves as set forth above can be computed when the actual N0 of the sample is known. The absolute error is defined as:
                    AE        =                                                      N              0              actual                        -                          N              0              predicted                                                                    (        11        )            In other words, AE is the absolute value of the difference between the known initial N0 and the predicted N0. Because AE is becomes less sensitive when the initial N0 increases, absolute relative error, which divides absolute error by the actual N0 to provide an error value that doesn't vary significantly as a function of the value of N0. From FIG. 10, it can be seen that for the PBEF1 data, the mean ARE (over all the serial dilutions) is 3.9% whereas for IFNAR1, the mean ARE (over all the serial dilutions) is 6.5%. This shows that the ARE varies from gene to gene and as a function of N0, but the values obtained for PBEF1 and IFNAR1 are typical for a fully-quantitative PCR.
In gene expression measurements, there is an additional step, because the desired quantity is mRNA concentration, not the measured cDNA. So the additional step is determining the efficiency of the reverse transcription from the desired quantity, initial mRNA concentration, the measured quantity, cDNA. The reverse transcriptase reaction contributes most of the variation to the measurement of the mRNA quantity. While it is possible to determine the efficiency of the reverse transcription reaction and therefore the desired value, initial mRNA concentration, in practice this is a difficult process. Thus, to circumvent the need for determining the efficiency of the reverse transcriptase reaction, in practice what is done is to compare the measured abundance value of the gene of interest to that of a reference gene. As shown below, this circumvents the need to know the efficiency of the reverse transcriptase reaction for the gene of interest. In the method, two different genes are measured at the same time: the one of interest and the one that is assumed will not vary (e.g. is not regulated by the biological condition under study). The relative expression for two genes, A and B is given by:
                                          N            A                                N            B                          =                              κ            RS                    ⁢                                                                      η                  A                                ⁡                                  (                                      1                    +                                          E                      B                                                        )                                                                              C                  TB                                -                1                                                                                      η                  B                                ⁡                                  (                                      1                    +                                          E                      A                                                        )                                                                              C                  TA                                -                1                                                                        (        12        )            where
κRS is the relative sensitivity of the detection chemistries for genes A and B,
ηA is the cDNA reverse transcriptase yield for gene A,
ηB is the cDNA reverse transcriptase yield for gene B,
EA is the efficiency of the PCR reaction for gene A,
EB is the efficiency of the PCR reaction for gene B,
NA is the mRNA abundance of gene A,
NB is the mRNA abundance of gene B,
CTB is CT for gene A, and
CTB is CT for gene B.
The value κRS will depend on many different reaction conditions. In order to avoid the problem of determining the unknown parameters κRS, ηA and ηB, the “comparative quantification” method (or ΔΔCT method) is used. Parameters κRS, ηA, and ηB cancel out when a ratio of the ratios NA/NB for different samples is considered, assuming that the parameters' values do not vary from sample to sample.
Typically, the gene of interest (gene A) is a gene of interest. For example, the abundance of the mRNA of gene A is being studied because it is believed that the abundance of the mRNA of that gene various as a function of the state of a disease under study in members of a population. In such instances, a gene B is chosen as a reference gene for computations in accordance with Equation 12 that is not believed to vary as a function of the state of the disease under study in the members of the population and, moreover, where it is believed that the expression level of the mRNA for gene B does not change in the members of the population. So, two quantitative PCR reactions are done at the same time, one for the gene of interest (e.g., gene A) in and one for the reference gene (e.g. gene B). From these experiments, NA and NB are calculated. If the parameters κRS, ηA and ηB, do not change, than it is sufficient to simply compare the ratios:
                              ρ          ≡                                                    (                                  1                  +                                      E                    B                                                  )                                                              C                  TB                                -                1                                                                    (                                  1                  +                                      E                    A                                                  )                                                              C                  TA                                -                1                                                    =                  ψ          ⁢                                    N              A                                      N              B                                                          (        13        )            for different samples, with ψ=ηA/ηBκRS being a constant. This is because the values for ηA, ηB and κRS are not important if they do not vary from sample to sample. Thus, if the goal is to compare a disease state with a healthy state, then the ratio of the quantity (1+EB)CTB−1 and the quantity (1+EA)CTA−1 is all that is needed if ηA, ηB and κRS do not vary. Additionally, if the assumption is made that the efficiency for both genes A and B (the gene of interest and the reference gene) is the same, and that efficiency is E (EA=EB≡E), than Equation 13 will reduce to Equation 14:
                              ψ          ⁢                                    N              A                                      N              B                                      =                              (                          1              +              E                        )                                              C              TB                        -                          C              TA                                                          (        14        )            With Equation 14, different samples can be quantitatively compared by just comparing ΔCT≡CTB−CTA. If the efficiencies EA and EB are known than ρ from Equation 14 can be calculated.
In conventional quantitative PCR, ΔCT is the metric that is used to compare different samples. However, as discussed above, ΔCT assumes that EA and EB are the same. Exemplary data indicates that the assumption that EA and EB are the same is problematic because the calculated efficiency of various genes that have been studied ranges. For example, E for the gene PBEF1 is E=0.9684±0.0094, E for the gene ADM is E=0.91411±0.0164, E for IL1R2 is E=0.7744±0.0156, E for IRAK3 is 1.0118±0.0130, and E for JAK3 is 0.9777±0.0102. Even more problematically, the E for some genes varies from test to test. For example, the E for the reference gene (gene 18S in the considered example) have been variously computed as 0.88, 0.93, 0.97, and 1.1.
Given the variation in EA and EB when Equation 14 assumes no such variation, of interest is how much error the variation in EA and EB introduces into the ratio NA/NB when just ΔCT is used to compare different samples. Applying standard error propagation law:
                    z        =                  f          ⁡                      (                                          x                1                            ,                              x                2                            ,              …              ⁢                                                          ,                              x                n                                      )                                              (        15        )                                          σ          z          2                =                              ∑                          i              =              1                        n                    ⁢                                                    (                                                      ∂                    F                                                        ∂                                          x                      i                                                                      )                            2                        ⁢                          σ                              x                i                            2                                                          (        16        )            and thus:
                    z        =                                            Log              10                        ⁢            ρ                    =                                    Log              10                        ⁢                                                            (                                      1                    +                                          E                      B                                                        )                                                                      C                    TB                                    -                  1                                                                              (                                      1                    +                                          E                      A                                                        )                                                                      C                    TA                                    -                  1                                                                                        (        17        )            and the use of typical values for the parameters:EA=EB≅96%,CTA≅9.5,CTB≅26, andσ(CTA)=σ(CTB)≅0.12,the coefficient of variance of Log10ρ, CV(Log10ρ), can be computed. Log10ρ is the measure of the ratio of genes A and B. CV(Log10ρ)≡ρ(Log10ρ)/Log10ρ as a function of CV(EB) and CV(EA). As illustrated in the top graph of FIG. 11, CV(Log10ρ), does not vary much as a function of the coefficient of variance of the gene with low CT (here gene B). Typically, the reference gene will have low CT. Thus, even if the coefficient of variance of the efficiency of the reference gene goes from, say, 2% to 8%, CV(Log10ρ) will not change very much, as exhibited by the flat curves in the upper graph in FIG. 11. On the other hand, referring to the lower graph of FIG. 11, the coefficient of variance of the efficiency of the gene with high CT (here gene B) has dramatic effect on the error of Log10ρ. As can be seen in the lower graph of FIG. 11, the relationship between the coefficient of variance of gene B and the error of Log10ρ share a linear relationship. Thus, the lower graph shows that significant error arises if the assumption that the efficiency doesn't vary, when in fact it does.
Referring to FIG. 12, it is seen that the CT values for the reference gene 18S for four different samples each with three replicates per sample exhibits little variation from sample to sample, consistent with the assumption that the 18S gene is not expressed in the subjects that provided the samples. FIG. 13 shows the mean CT values of 18S and their 95% confidence intervals (CI) based on the three replicates. As shown in FIG. 13, there is little variation of the CT of 18S from sample to sample, consistent with the assumption that 18S is not expressed in the subjects that provided the samples. Thus, FIGS. 12 and 13 collectively show that the abundance of 18S mRNA does not vary from sample to sample, although the samples are from completely different patients. In FIG. 14, CT values of a gene that is expressed, PBEF1, for four different samples with three replicates for each sample are given. In FIG. 15, mean CT values for PBEF1 are well separated for different samples. In FIG. 15, the 95% confidence intervals shown are based on the three replicates for each sample. In FIG. 16, it is seen that means of ΔCT≡CTPBEF1−CT18S are quite well separated from sample to sample. However, as discussed in the background section above, ΔCT does not directly reflect potential differences in the ratio N18S/NPBEF1 for different samples, and one has to consider the variability of these ratios (or at least of ρ or Log10ρ).
Referring to FIG. 17, if the gene efficiencies EPBEF1 and E18S do not vary from sample to sample, and the only source of the efficiency uncertainty is related to the regression of the standard curve, then the means of the calculated log10ρ are still potentially well separated. This suggests that the standard quantitative PCR (qPCR) analysis is capable of detecting differences in the ratio N18S/NPBEF1 in the subjects that provided the samples, if such differences exist. In this example, E=0.9684±0.0094 is assumed for both PBEF1 and 18S. With this efficiency of 18S, typical coefficient of variance values for the resulting metric log10ρ for the considered example genes and samples are between about 1.5% and 3%.
Referring to FIG. 18 if, however, the variability of EPBEF1 and E18S from sample to sample is taken into account, the confidence intervals of log10ρ for the ratio N18S/NPBEF1 are so large that the sample separation becomes questionable. In this example E18S=0.97±0.094 (CV˜10%) and EPBEF1=0.97±0.047 (CV˜5%) are assumed, representing a potential worse-case scenario. These results suggest that unless something can be done to control the efficiency variation, the standard qPCR analysis presented here may not lead to a reliable detection of differences in the ratio N18S/NPBEF1.
If the conditions of FIG. 17 hold true, a determination can be made as to whether there are differences in the ratio N18S/NPBEF1 in various subjects because the 95% confidence interval do not overlap. But if the conditions of FIG. 18 hold true, where an error in the assumptions of efficiency are made, the confidence intervals start overlapping and the ability to discriminate changes in the ratio N18S/NPBEF1 from sample to sample is lost.
Thus, given the likely sources of error in the equations above when the assumptions underlying the equations are, in fact, not correct, what is needed are new methods for extracting quantitative information about the initial amount of a target nucleic acid from individual PCR amplification curves.