PCR (i.e. Polymerase Chain Reaction) technology is fundamental to modern biology and is considered among the most important scientific developments for the analysis of biological sequence data. Used in virtually every laboratory conducting molecular, cellular, genetic, ecologic, forensic, or medical research, among others, PCR generally involves the amplification of DNA by subjecting a DNA fragment to a series of heating and cooling cycles, including denaturing the DNA by heating, annealing oligonucleotide primers to complementary bases during cooling, and heating with polymerase for DNA extension and synthesis.
Despite its ubiquity, the precise details about the effects of DNA sequence and reaction/processing conditions on the PCR product is currently lacking, as well as the ability to predict all products of PCR amplification as a function of DNA sequence and reaction conditions. In particular, the effects of time, temperature, ion concentration, complex sequence mixtures containing primer perfect matches and/or sequence mismatches, and the locations of mismatches within primer sequences (3′,5′, or central) are not well understood, as well as what products are amplified when complex sequence mixtures are present containing multiple, closely related targets and multiple primers which may hybridize with sequence mismatches. A better understanding of these issues and the ability to predict all PCR products can improve the design, interpretation, and detailed optimization of PCR reactions.
Various developments in computational PCR reaction design, such as the development of DNA melting programs, have demonstrated the importance of calculations in PCR analysis. Such efforts, however, do not adequately address the underlying physics and biology of the problem; only examine isolated aspects of the systems; and do not follow all significant reaction pathways which arise during the process. Specifically, while various software programs are available for PCR reaction design they are generally limited to selecting a “best” primer or, examining how reaction conditions effect the desired reactions during only the first thermocycle. For example, commercially available bioinfomatics programs, such as Primer3, PRIMO, Primer Design, and Oligonucleotide, return a list of best primers for a specific target based on sequence similarities, primer length, and melting temperatures, using simple GC content equations or Nearest Neighbor parameters, described in “Thermodynamics and NMR of internal G.T mismatches in DNA” by Allawi, H T and Santalucia, J, Jr., Biochem 36, 10581-10594 (1997), incorporated by reference herein. And more recent programs, such as Visual OMP, provide the ability to predict the concentration of hybridized species for only the first round of PCR for a given set of reaction conditions. However, they do not include, for example, modeling of extension, replication, or multiple thermocycles; do not run in parallel which hinders the modeling of complex mixtures and limits the complexity of mixtures that can be modeled; assume that all reactions proceed to equilibrium without considering non-equilibrium conditions; and do not consider the effects of competitive contaminants.
As suggested above, current practices estimate PCR products based solely on thermodynamics. However, in order to gain insight into the dynamics of the process and to optimize cycle processing conditions, there is a need to employ reaction kinetics. Traditionally, researchers have described the reaction kinetics using deterministic methods. While these approaches are valuable, it is well known that biochemical processes are stochastic in nature. This is suggested in “Computational and Experimental Analysis of DNA Shuffling” by Maheshri, N and Schaffer, D V (2003), Proc Natl Acad Sci USA 100, 3071-3076, where probabilistic methods were applied to modify rates of reaction to describe the DNA shuffling process. However, the probabilities of competitive events to describe the rates of stochastic events are not considered.
There is therefore a need for a complete and integrated computational model and methodology that enables the prediction of both the sequences and the concentrations of all species of hybridized and single strand sequences that are amplified during PCR. And in particular, there is a need for a computational method and computer-based system that not only models all relevant parts of the PCR process under investigation and the resulting reaction pathways, but also integrates the results by using the output of one modeling calculation as input into a subsequent modeling calculation. Furthermore, a need exists to derive probabilities of competitive events to describe the rates of stochastic events to provide a generalized framework that can be applied to describe the time-evolution of a host of biochemical processes. In this manner, a full spectrum of PCR products can be predicted after multiple thermocycles.