The ability of enzymes to catalyze biological reactions is fundamental to life. A range of biological applications use enzymes to synthesize various biomolecules in vitro. One particularly useful class of enzymes is the polymerases, which can catalyze the polymerization of biomolecules (e.g., nucleotides or amino acids) into biopolymers (e.g., nucleic acids or peptides). For example, polymerases that can polymerize nucleotides into nucleic acids, particularly in a template-dependent fashion, are useful in recombinant DNA technology and nucleic acid detection and nucleic acid sequencing applications. Many nucleic acid sequencing methods monitor nucleotide incorporations during in vitro template-dependent nucleic acid synthesis catalyzed by a polymerase. Single Molecule Sequencing (SMS) and Paired-End Sequencing (PES) typically include a polymerase for template-dependent nucleic acid synthesis. Polymerases are also useful for the generation of nucleic acid libraries, such as nucleic acid libraries created during emulsion PCR or bridge PCR. Nucleic acid libraries created using such polymerases can be used in a variety of downstream processes, such as genotyping, nucleotide polymorphism (SNP) analysis, copy number variation analysis, epigenetic analysis, gene expression analysis, hybridization arrays, analysis of gene mutations including but not limited to detection, prognosis and/or diagnosis of disease states, detection and analysis of rare or low frequency allele mutations, and nucleic acid sequencing including but not limited to de novo sequencing or targeted resequencing.
A desirable quality of a polymerase useful for nucleic acid amplification, synthesis and/or detection is improved incorporation of nucleotides as compared to a reference polymerase. Improved nucleotide incorporation can make processes such as nucleic acid library preparation and/or DNA sequencing more cost effective by reducing the number of nucleic acid templates necessary to sequence a desired target molecule. In another aspect, improved nucleotide incorporation as compared to a reference polymerase can also reduce the number of sequencing reads required to determine the sequence of the desired target molecule. Additionally, improved nucleotide incorporation (as compared to a reference polymerase) can also improve signal uniformity, leading to increased accuracy in base determination of the desired target molecule. In yet another aspect, improved nucleotide incorporation by a modified polymerase as compared to a reference polymerase can increase the read length of the desired target molecule and thus reduces the likelihood of the modified polymerase stalling or dissociating from the desired target molecule. In yet another aspect, a modified polymerase having improved templating or clonal amplification efficiency as compared to a reference polymerase and thus can improve downstream sequencing of a target molecule that is customarily considered a “difficult” target molecule, such as a target molecule with high GC or AT content. As such, one aspect of invention is to provide a method, system, apparatus, and compositions of matter that improve GC and AT bias in nucleic acid amplification using a modified polymerase having a reduced GC or AT content bias.
Another desirable quality in an enzyme used in nucleic acid library preparation or DNA sequencing is thermal stability. DNA polymerases exhibiting thermal stability have revolutionized many aspects of molecular biology and clinical diagnostics since the development of the polymerase chain reaction (PCR), which uses cycles of thermal denaturation, primer annealing, and enzymatic primer extension to amplify DNA templates. A prototype thermostable DNA polymerase used in the initial PCR experiments was Taq DNA polymerase, originally isolated from the thermophilic eubacterium Thermus aquaticus. 
There are three major families of DNA polymerases, termed families A, B and C. The classification of a polymerase into one of these three families is based on structural similarity of a given polymerase to E. coli DNA polymerase I (Family A), II (Family B) or III (family C). As examples, Family A DNA polymerases include, but are not limited to Klenow DNA polymerase, Thermus aquaticus DNA polymerase I (Taq polymerase) and bacteriophage T7 DNA polymerase; Family B DNA polymerases, formerly known as α-family polymerases (Braithwaite and Ito, 1991, Nuc. Acids Res. 19:4045), include, but are not limited to human α, δ and ε DNA polymerases, T4, RB69 and φ29 bacteriophage DNA polymerases, and Pyrococcus furiosus DNA polymerase (Pfu polymerase); and family C DNA polymerases include, but are not limited to Bacillus subtilis DNA polymerase III, and E. coli DNA polymerase III α and ε subunits (listed as products of the dnaE and dnaQ genes, respectively, by Braithwaite and Ito, 1993, Nucleic Acids Res. 21: 787). An alignment of DNA polymerase protein sequences of each family across a broad spectrum of archaeal, bacterial, viral and eukaryotic organisms is presented in Braithwaite and Ito (1993, supra), which is incorporated herein by reference in its entirety.
When performing polymerase-dependent nucleic acid synthesis or amplification, it can be useful to modify the polymerase (for example via mutation or chemical modification) so as to alter its catalytic properties. In some instances, it can be useful to modify the polymerase to enhance its catalytic properties. In some embodiments, it can be useful to enhance a polymerase's catalytic properties via site-directed amino acid substitution or deletion. In some embodiments, it can be useful to enhance a polymerase's catalytic properties via site-saturation mutagenesis of one, a plurality, or each, amino acid of the polymerase. In some embodiments, modification of a polymerase may be performed to enhance catalytic properties of the modified polymerase such as read length, accuracy, and/or processivity.
Polymerase performance in various biological assays involving nucleic acid synthesis or detection can be limited by the behavior of the polymerase towards nucleotide substrates, salt concentrations, or thermostable conditions. For example, analysis of polymerase activity can be complicated by undesirable behavior such as the tendency of a given polymerase to dissociate from the template; to bind and/or incorporate the incorrect, e.g., non Watson-Crick base-paired, nucleotide; or to release the correct, e.g., Watson-Crick based paired, nucleotide without incorporation. Additionally, analysis of polymerase activity can be complicated by undesirable behavior of a target molecule from fully denaturing, such as in high AT and GC rich regions or premature attenuation of the target molecule. As demonstrated herein, desirable polymerase properties for improved nucleic acid amplification can be achieved via suitable selection, engineering and/or modification of a polymerase of choice. For example, such modification can be performed to favorably alter the polymerase's affinity of binding to template, processivity, accuracy of nucleotide incorporation, strand bias, and coverage. Such alterations within the polymerase can also increase the amount of sequence information and/or quality of sequencing information obtained directly, or downstream, from the improved amplification workflow utilizing such a modified polymerase.
There remains a need in the art for improved polymerase compositions (and related methods, systems, apparatuses, and kits) exhibiting altered properties, e.g., increased processivity, increased read length (including error-free read length), increased accuracy and/or affinity for DNA template, increased coverage, decreased strand bias and/or decreased systematic error. Such polymerase compositions (and related methods, systems, apparatuses, and kits) can be useful in a wide variety of assays involving polymerase-dependent nucleic acid synthesis, including nucleic acid sequencing and/or the production of nucleic acid libraries, such as nucleic acid libraries prepared by bridge PCR or clonal amplification.