Detectable labeling of nucleic acids is required for many applications in molecular biology, including applications for research as well as clinical diagnostic techniques. A commonly used method of labeling nucleic acids uses one or more unconventional nucleotides and a polymerase enzyme that catalyzes the template-dependent incorporation of the unconventional nucleotide(s) into the newly synthesized complementary strand.
The ability of a DNA polymerase to incorporate the correct deoxynucleotide is the basis for high fidelity DNA replication in vivo. Amino acids within the active site of polymerases form a specific binding pocket that favors the placement of the correct complementary nucleotide opposite the template nucleotide. If a mismatched nucleotide, ribonucleotide, or nucleotide analog fills that position, the precise alignment of the amino acids contacting the incoming nucleotide may be distorted into a position unfavorable for DNA polymerization. Because of this, the unconventional nucleotides or nucleotide analogs used to label DNA tend to be incorporated into the elongated strand less efficiently than do the standard deoxynucleotide triphosphates (dNTPs; the so-called “standard” dNTPs include deoxyadenosine triphosphate (dATP), deoxycytosine triphosphate (dCTP), deoxyguanosine triphosphate (dGTP), and deoxythymidine triphosphate (dTTP)).
The reduced efficiency with which unconventional nucleotides are incorporated by the polymerase increases the amount of the unconventional nucleotide necessary for DNA labeling. The reduced efficiency of incorporation of a particular nucleotide can also adversely affect the performance of techniques or assays, such as DNA sequencing, that depend upon unbiased incorporation of unconventional nucleotides for homogeneous signal strength.
The identity and exact arrangement of the amino acids of a DNA polymerase that contact an incoming nucleotide triphosphate determine the nature of the nucleotides, both conventional and unconventional, that may be incorporated by that polymerase enzyme. Changes in the exact placement of the amino acids that contact the incoming nucleotide triphosphate at any stage of binding or chain elongation can dramatically alter the polymerase's capacity for utilization of unusual or unconventional nucleotides. Sometimes changes in distant amino acids can influence the incorporation of nucleotide analogs due to indirect global or structural effects. Polymerases with increased capacity to incorporate nucleotide analogs are useful for labeling DNA or RNA strands with nucleotides modified with signal moieties such as dyes, reactive groups or unstable isotopes.
In addition to labeled nucleotides, an extremely important class of modified nucleotides is the dideoxynucleotides. The so-called “Sanger” or “dideoxy” DNA sequencing method (Sanger et al., 1977, Proc. Natl. Acad. Sci. USA 74: 5463, which is incorporated herein by reference) relies upon the template-directed incorporation of nucleotides onto an annealed primer by a DNA polymerase from a mixture containing deoxy- and dideoxynucleotides. The incorporation of a dideoxynucleotide results in chain termination, the inability of the enzyme to catalyze further extension of that strand. Electrophoretic separation of reaction products results in a “ladder” of extension products wherein each extension product ends in a particular dideoxynucleotide complementary to the nucleotide opposite it in the template. The distance of the dideoxynucleotide analog from the primer is indicated by the length of the extension product. When four reactions, each containing one of the four dideoxynucleotide analogs ddA, ddC, ddG, or ddT (ddNTPs) are separated on the same gel, the sequence of the template may be read directly from the ladder patterns. Extension products may be detected in several ways, including for example, the inclusion of isotopically- or fluorescently-labeled primers, deoxynucleotide triphosphates or dideoxynucleotide triphosphates in the reaction.
Fluorescent labeling has the advantages of faster data collection, since detection may be performed while the gel is running, and longer reads of sequence data from a single reaction and gel. Further, fluorescent sequence detection has allowed sequencing to be performed in a single reaction tube containing four differentially-labeled fluorescent dye terminators (the so-called dye-terminator method, Lee et al., 1992, Nucleic Acids Res. 20: 2471, incorporated herein by reference).
A desirable quality of a polymerase useful for DNA sequencing is improved incorporation of dideoxynucleotides. Improved incorporation of dideoxynucleotides can make processes such as DNA sequencing more cost effective by reducing the requirement for expensive radioactive or fluorescent dye-labeled dideoxynucleotides. Moreover, unbiased dideoxynucleotide incorporation provides improved signal uniformity, leading to increased accuracy of base determination. The even signal output further allows subtle sequence differences caused by factors like allelic variation to be detected. Allelic variation, which produces two different half strength signals at the position of relevance, can easily be concealed by the varied signal strengths caused by polymerases with non-uniform ddNTP utilization.
Incorporation of ribonucleotides by the native form of DNA polymerase is a rare event. Mutants that incorporate higher levels of ribonucleotides can be used for applications such as sequencing by partial ribosubstitution. In this system, a mixture of ribonucleotides and deoxynucleotides corresponding to the same base are incorporated by the mutant polymerase (Barnes, 1978 J. Mol. Biol. 119:83-99). When the ribosequencing reactions are exposed to alkaline conditions and heat, fragmentation of the extended strand occurs. If the reactions for all four bases are separated on a denaturing acrylamide gel, they produce a sequencing ladder. there is a need in the art for polymerase mutants with higher utilization of ribonucleotides for this alternative method of sequencing.
Alternatively, the incorporation of ribonucleotides followed by alkaline hydrolysis could be utilized in a system that requires random cleavage of DNA molecules such as DNA shuffling ((Stemmer, 1994, Nature, 370: 389-391) which has also been called molecular breeding, sexual PCR and directed evolution).
Another desirable quality in a DNA labeling enzyme is thermal stability. DNA polymerases exhibiting thermal stability have revolutionized many aspects of molecular biology and clinical diagnostics since the development of the polymerase chain reaction (PCR), which uses cycles of thermal denaturation, primer annealing, and enzymatic primer extension to amplify DNA templates. The prototype thermostable DNA polymerase is Taq polymerase, originally isolated from the thermophilic eubacterium Thermus aquaticus. So-called “cycle sequencing” reactions using thermostable DNA polymerases have the advantage of requiring smaller amounts of starting template relative to conventional (i.e., non-cycle) sequencing reactions.
There are three major families of DNA polymerases, termed families A, B and C. The classification of a polymerase into one of these three families is based on structural similarity of a given polymerase to E. coli DNA polymerase I (Family A), II (Family B) or III (family C). As examples, Family A DNA polymerases include, but are not limited to Klenow DNA polymerase, Thermus aquaticus DNA polymerase I (Taq polymerase) and bacteriophage T7 DNA polymerase; Family B DNA polymerases, formerly known as α-family polymerases (Braithwaite and Ito, 1991, Nuc. Acids Res. 19:4045), include, but are not limited to human α, δ and ε DNA polymerases, T4, RB69 and φ29 bacteriophage DNA polymerases, and Pyrococcus furiosus DNA polymerase (Pfu polymerase); and family C DNA polymerases include, but are not limited to Bacillus subtilis DNA polymerase III, and E. coli DNA polymerase III α and ε subunits (listed as products of the dnaE and dnaQ genes, respectively, by Brathwaite and Ito, 1993, Nucleic Acids Res. 21: 787). An alignment of DNA polymerase protein sequences of each family across a broad spectrum of archaeal, bacterial, viral and eukaryotic organisms is presented in Braithwaite and Ito (1993, supra), which is incorporated herein by reference.
The term used to describe the tendency of DNA polymerases to not to carry the incorporation of unnatural nucleotides into the nascent DNA polymer is “discrimination”. In Family A DNA polymerases, the effective discrimination against incorporation of dideoxynucleotide analogs is largely associated with a single amino acid residue. The majority of enzymes from the Family A DNA polymerases have a phenylalanine (phe or F) residue at the position equivalent to F762 in E. coli Klenow fragment of DNA polymerase and demonstrate a strong discrimination against dideoxynucleotides. A few polymerases (e.g. T7 DNA polymerase) have a tyrosine (tyr or Y) residue at the corresponding position and exhibit relatively weak discrimination against dideoxynucleotides. Family A polymerases with tyrosine at this position readily incorporate dideoxynucleotides at levels equal to or only slightly different from the levels at which they incorporate deoxynucleotides. Conversion of the tyrosine or phenylalanine residues in the site responsible for discrimination reverses the dideoxynucleotide discrimination profile of the Family A enzymes (Tabor and Richardson, 1995, Proc. Natl. Acad. Sci. USA 92:6449).
Among the thermostable DNA polymerases, a mutant form of the Family A DNA polymerase from Thermus aqaticus, known as AmpliTaq FS® (Perkin Elmer), contains a F667Y mutation at the position equivalent to F762 of Klenow DNA polymerase and exhibits increased dideoxynucleotide uptake (i.e., reduced discrimination against ddNTPs) relative to the wild-type enzyme. The reduced discrimination for dideoxynucleotide uptake makes it more useful for fluorescent and labeled dideoxynucleotide sequencing than the wild-type enzyme.
The F667Y mutant of Taq DNA polymerase is not suited, however, for use with fluorescein-labeled dideoxynucleotides, necessitating the use of rhodamine dye terminators. Rhodamine dye terminators that are currently utilized with Taq sequencing reactions, however, stabilize DNA secondary structure, causing compression of signal. Efforts to eliminate compression problems have resulted in systems that use high amounts of the nucleotide analog deoxyinosine triphosphate (dITP) in place of deoxyguanosine triphosphate. While incorporation of (dITP) reduces the compression of the signal, the presence of dITP in the reaction produces additional complications including lowered reaction temperatures and increased reaction times. Additionally, the use of rhodamine dyes in sequencing requires undesirable post-reaction purification (Brandis, 1999 Nuc. Acid Res. 27:1912).
Family B DNA polymerases exhibit substantially different structure compared to Family A DNA polymerases, with the exception of the position of acidic residues involved in catalysis in the so-called palm domain (Wang et al., 1997, Cell 89:1087; Hopfner et al., 1999, Proc. Natl. Acad. Sci. USA 96:3600). The unique structure of Family B DNA polymerases may permit a completely different spectrum of interactions with nucleotide analogs, perhaps allowing utilization of analogs which are unsuitable for use with Family A DNA polymerases due to structural constraints. Thermostable Family B DNA polymerases have been identified in hyperthermophilic archaea. These organisms grow at temperatures higher than 901 C. and their enzymes demonstrate greater themostability (Mathur et al., 1992, Stratagies 5:11) than the thermophilic eubacterial Family A DNA polymerases. Family B polymerases from hyperthermophilic archaea may be well suited starting substrates for modification(s) to reduce discrimination against non-conventional nucleotides.
Although the crystal structures of three Family B DNA polymerases have been solved (Wang et al., 1997, supra; Hopfner, K.-P. et al., 1999, Proc. Natl. Acad. Sci. 96: 3600; Zhao, 1999, Structure Fold Des., 7:1189), the structures of DNA-polymerase or dNTP-polymerase co-complexes have not yet been reported. At present, identification of amino acid residues contributing to nucleotide analog discrimination can only be inferred from extrapolation to Family A-dNTP structures or from mutagenesis studies carried out with related Family B DNA polymerases (e.g., human polα, phage T4, phage φ29, T. litoralis DNA polymerase).
Sequence comparison of the Family B DNA polymerases indicate six conserved regions numbered I-VI (Braithwaite and Ito, 1993, supra). The crystal structure of bacteriophage RB69 DNA polymerase (Family B) proposed by Wang et al. (Wang et al., 1997, supra) shows that Y416 in region II (which corresponds to Y409 in the Family B DNA polymerase of Thermococcus species JDF-3) has the same position as Y115 in HIV reverse transcriptase (RT) and E710 in the Klenow fragment (Family A polymerases). Modeling of the dNTP and primer template complex in RB69 was carried out using the atomic coordinates of the reverse transcriptase-DNA cocrystal. This model predicts the RB69 Y416 packs under the deoxyribose portion of the dNTP. Tyrosine at this position has been implicated in ribose selectivity, contributing to polymerase discrimination between ribonucleotides and deoxribonucleotides in mammalian reverse transcriptases (Y115) (Gao et al., 1997, Proc. Natl. Acad. Sci. USA 94:407; Joyce, 1994, Proc. Natl. Acad. Sci. USA 94:1619) and in Family A DNA polymerases where modification of the corresponding invariable glutamate residue (E710) reduces discrimination against ribonucleotides (Gelfand et al., 1998, Pat. No. EPO823479; Astatke et al, 1998, Proc. Natl. Acad. Sci. USA 96:3402).
Mutagenesis studies done in Family B DNA polymerases also implicate the region containing the analogous Y in region II in dNTP incorporation and ribose selectivity. Mutations at the corresponding Y865 in human DNA polymerase α affect polymerase fidelity and sensitivity to dNTP nucleotide inhibitors such as AZT-TP, which has a bulky 3′-azido group in place of the 3′-OH group, BuPdGTP, which contains a butylphenyl group attached to the amino group at the C-2 position in the guanine base of dGTP (resulting in a bulkier and more hydrophobic purine base nucleotide) and aphidicolin, a competitive inhibitor of pyrimidine deoxynucleotide triphosphate. Interestingly, the mutants showed no difference in their uptake of ddCTP (Dong et al., 1993, J. Biol. Chem. 268: 26143). Additionally, mutants of bacteriophage T4 DNA polymerase, which have converted L412 to methionine (M) or isoleucine (I) just one amino acid before the analogous Y (Y411), show extreme and mild sensitivity, respectively, to the inorganic pyrophosphate analog phosphonoacetic acid (PAA). Alterations in PAA sensitivity have been shown to predict polymerase interactions with nucleotide analogs. L412 in T4 DNA polymerase corresponds to L410 in Thermococcus species JDF-3 DNA polymerase. The L412M T4 DNA polymerase mutant was inhibited with 50-fold less ddGTP than wild-type polymerase while the Kms for dGTP was similar. As stated by the authors in that study, “[d]espite the sensitivity of the L412M DNA polymerase to ddGTP, there was no difference found in the incorporation of ddNTPs by wild-type and L412M DNA polymerase.” (Reha-Krantz et al., 1993, J. Virol. 67:60). In bacteriophage φ29, mutations in region II (LYP where Y is analogous to Thermococcus species JDF3 DNA polymerase Y409) produce mixed results when challenged with PAA; P255S was hypersensitive to PAA while L253V was shown to be less sensitive than the wild-type enzyme (Blasco et al., 1993, J. Biol. Chem. 268: 24106). These data support the role of the LYP region (region II) in polymerase-nucleotide interactions, but improved incorporation of ddNTPs was not achieved in these references.
In another study, extensive mutation of region II in the archaeal Family B DNA polymerase from Thermococcus litoralis DNA polymerase (VENT™ polymerase, New England Biolabs) was performed. In that study, 26 different site-directed mutants were made for the sole intent of examining nucleotide analog discrimination (Gardner and Jack, 1999, Nucleic Acids Res. 27: 2545). Site-directed mutagenesis of VENT™ DNA polymerase demonstrated that three mutations at Y412 (which corresponds to JDF-3 DNA polymerase Y409) could alter nucleotide binding (Gardner and Jack, 1999, supra). Y412V was most significant with a 2 fold increase in dideoxynucleotide incorporation and a 200 fold increase in the incorporation of ribonucleotide ATP. The mutation Y412F showed no change in analog incorporation.
Region III of the Family B polymerases (also referred to as motif B) has also been demonstrated to play a role in nucleotide recognition. This region, which corresponds to AA 487 to 495 of JDF-3 Family B DNA polymerase, has a consensus sequence KX3NSXYG (Jung et al., 1990, supra; Blasco et al., 1992, supra; Dong et al., 1993, J. Biol. Chem. 268:21163; Zhu et al., 1994, Biochem. Biophys. Acta 1219:260; Dong and Wang, 1995, J. Biol. Chem. 270:21563), and is functionally, but not structurally (Wang et al., 1997, supra), analogous to KX3(F/Y)GX2YG in helix O of the Family A DNA polymerases. In Family A DNA polymerases, such as the Klenow fragment and Taq DNA polymerases, the O helix contains amino acids that play a major role in dNTP binding (Astatke et al., 1998, J. Mol. Biol. 278:147; Astatke et al., 1995, J. Biol. Chem. 270:1945; Polesky et al., 1992, J. Biol. Chem 267:8417; Polesky et al., 1990, J. Biol. Chem. 265:14579; Pandey et al., 1994, J. Biol. Chem. 269:13259; Kaushik et al., 1996, Biochem. 35:7256). Specifically, helix O contains the F (F763 in the Klenow fragment; F667 in Taq) which confers ddNTP discrimination in Family A DNA polymerases (KX3(F/Y)GX2YG) (Tabor and Richardson, 1995, supra).
Directed mutagenesis studies in region III of VENT™ DNA polymerase also targeted an alanine analogous to A485 of the Thermococcus species JDF-3 DNA polymerase. These mutants (A→C, A→S, A→L, A→I, A→F and A→V) exhibited a range of specific activities from 0.12 to 1.2 times the polymerase activity of the progenitor enzyme (Gardner and Jack, 1999, Nucl. Acids Res. 27:2545). The dideoxynucleotide incorporation ranged from 4 to 15 times the unmutated enzyme. Interestingly, the mutant with the highest dideoxynucleotide incorporation (15×) had a specific activity of only 0.12× of the original enzyme.
Site-directed mutagenesis studies on the Family B DNA polymerase from Thermococcus barossii modified each residue independently in the sequence ILANSF (SEQ ID NO:49), which corresponds to AA residues 488-493 of the JDF-3 DNA polymerase, to tyrosine (Reidl et al., U.S. Pat. No. 5,882,904). That study indicated that an L489Y mutant exhibits approximately 3 times greater incorporation of dideoxynucleotides relative to an enzyme bearing the wild-type leucine residue at this site.
One area of active research involves the use of nucleic acid arrays, often referred to as nucleic acid or DNA “chips”, in the simultaneous analyses of multiple different nucleic acid sequences. Many of these applications, such as those described in U.S. Pat. No. 5,882,904 (Reidl et al., issued Mar. 16, 1999) will benefit from DNA polymerases exhibiting reduced discrimination against non-conventional nucleotides, particularly fluorescently-labeled non-conventional nucleotides. Applications being addressed in the chip format include DNA sequencing and mutation detection, among others. For example, the “mini-sequencing” methods (e.g., Pastinen et al., 1997, Genome Res. 7: 606; Syvanen, 1999, Human Mutation 13: 1-10) and the arrayed primer extension (APEX) mutation detection method (Shumaker et al., 1996, Hum. Mutat. 7: 346) and methods like them can benefit from DNA polymerases with reduced discrimination against fluorescently-labeled or other non-conventional nucleotides. There is a need in the art for a non-discriminating DNA polymerase for use in chip or gel based mini-sequencing systems. Such a system would advantageously permit detection of multiplexed single nucleotide polymorphisms (SNPs) and allow for quantitative genotyping. Identification of sequence variation permits the diagnosis and treatment of genetic disorders, predisposition to multifactorial diseases, and sensitivity to new or existing pharmaceutical products.
With the completion of the human genome project, considerable attention is now focused on analyzing genetic variations between individuals, and specifically, single nucleotide polymorphisms (SNPs) which have been estimated to occur one in every 1000 bp (Halushka et al., 1999). The importance of SNPs is that they serve as genetic markers that enable identification of disease related loci (Lai et al., 1998). They can also be used to investigate the underlying cause of genetic diseases and could eventually help pave the way to personalized medicine.
Current assays used in SNP detection include hybridization to allele-specific oligonucleotide (ASO) probes (Saiki et al., 1989), oligonucleotide ligation assay (OLA) (Landegren et al., 1988), restriction fragment length polymorphism (RFLP) (Shi et al., 2001), TaqMan assay (Livak et al., 1995), molecular beacon assay (Tyagi et al., 1998), and primer extension assay (Tyagi et al., 1998; Gilles et al., 1999; Fu et al., 1998) on a variety of platforms including gel electrophoresis (Chen et al., 1997), MALDI-TOF mass spectrometry (Fu et al., 1998), solid phase minisequencing (Syvanen et al., 1990), semiconductor microchips (Gilles et al., 1999), and flow cytometric analysis (Taylor et al., 2001).
The principle of minisequencing is to anneal primers immediately adjacent to the SNP positions to be analyzed and to extend these primers with ddNTPs complementary to the SNP (Syvanen et al., 1990, hereby incorporated as reference) using a DNA polymerase that readily incorporates ddNTPs. Minisequencing is unique since it is based on the high accuracy (high specificity) of polymerase mediated nucleotide incorporation reactions rather than the thermostability of matched and mismatched species which affects most other SNP detection methods. Thus, compared to hybridization-based methods, minisequencing is insensitive to small variations in reaction conditions, temperature, and to flanking DNA sequence. Moreover, minisequencing allows discrimination between homozygous and heterozygous genotypes (Chen et al., 1997). These characteristics are important in multiplexing and/or high throughput SNP detection. With the completion of the genome project and considerable interest in high throughput SNP detection, a significant market exists for enzymes that efficiently incorporate ddNTPs and dye labeled-ddNTPs in single base extension assays (minisequencing).
DNA polymerases constitute a core component of minisequencing protocols. Efficient ddNTP and dye-ddNTP incorporation and high fidelity are essential characteristics of minisequencing enzymes. Commercially available DNA polymerases that are suitable for sequencing and minisequencing have been derived from either Taq (Taq F667Y mutants such as ThermoSequenase and AmpliTaqFS) or bacteriophage T7 DNA polymerase (Sequenase), which are both family A DNA polymerases. A tyrosine (Y) residue in the nucleotide binding pocket of T7 (native) or Taq (engineered F667Y mutant) DNA polymerase confers efficient ddNTP incorporation (Tabor et al., 1995). In two recent mutagenesis studies employing archaeal (family B) DNA polymerases, mutations were identified that reduced ddNTP discrimination; however, the archaeal DNA polymerase mutants incorporated ddNTPs less efficiently than the Taq F667Y mutant(Gardner et al., 1999; Evans et al., 2000).
There is a need in the art for DNA polymerases with reduced discrimination against unconventional nucleotides. There is particularly a need in the art for thermostable DNA polymerases exhibiting reduced discrimination against dideoxynucleotides, and further, for DNA polymerases exhibiting reduced discrimination against fluorescently labeled dideoxynucleotides.