1. Field of the Invention
The present invention relates to compositions and methods utilizing split polymerases composed of at least two discrete polypeptides that stably associate to form a single polymerase. The invention further relates to nucleic acid constructs for expressing the split polymerases of the invention, and methods for making the split polymerases of the invention. The enzymes of the invention are useful in many applications calling for DNA synthesis.
2. Description of Related Art
Detectable labeling of nucleic acids is required for many applications in molecular biology, including applications for research as well as clinical diagnostic techniques. A commonly used method of labeling nucleic acids uses one or more unconventional nucleotides and a polymerase enzyme that catalyzes the template-dependent incorporation of the unconventional nucleotide(s) into the newly synthesized complementary strand.
The ability of a DNA polymerase to incorporate the correct deoxynucleotide is the basis for high fidelity DNA replication in vivo. Amino acids within the active site of polymerases form a specific binding pocket that favors the placement of the correct complementary nucleotide opposite the template nucleotide. If a mismatched nucleotide, ribonucleotide, or nucleotide analog fills that position, the precise alignment of the amino acids contacting the incoming nucleotide may be distorted into a position unfavorable for DNA polymerization. Because of this, the unconventional nucleotides or nucleotide analogs used to label DNA tend to be incorporated into the elongated strand less efficiently than do the standard deoxynucleotide triphosphates (dNTPs; the so-called “standard” dNTPs include deoxyadenosine triphosphate (dATP), deoxycytosine triphosphate (dCTP), deoxyguanosine triphosphate (dGTP), and thymidine triphosphate (dTTP, also called TTP)).
The reduced efficiency with which unconventional nucleotides are incorporated by the polymerase increases the amount of the unconventional nucleotide necessary for DNA labeling. The reduced efficiency of incorporation of a particular nucleotide can also adversely affect the performance of techniques or assays, such as DNA sequencing, which depend upon unbiased incorporation of unconventional nucleotides for homogeneous signal strength.
The identity and exact arrangement of the amino acids of a DNA polymerase that contact an incoming nucleotide triphosphate determine the nature of the nucleotides, both conventional and unconventional, that may be incorporated by that polymerase enzyme. Changes in the exact placement of the amino acids that contact the incoming nucleotide triphosphate at any stage of binding or chain elongation can dramatically alter the polymerase's capacity for utilization of unusual or unconventional nucleotides. Sometimes changes in distant amino acids can influence the incorporation of nucleotide analogs due to indirect global or structural effects. Polymerases with increased capacity to incorporate nucleotide analogs are useful for labeling DNA or RNA strands with nucleotides modified with signal moieties such as dyes, reactive groups or unstable isotopes.
In addition to labeled nucleotides, an extremely important class of modified nucleotides is the dideoxynucleotides. The so-called “Sanger” or “dideoxy” DNA sequencing method (Sanger et al., 1977, Proc. Natl. Acad. Sci. USA 74: 5463, which is incorporated herein by reference) relies upon the template-directed incorporation of nucleotides onto an annealed primer by a DNA polymerase from a mixture containing deoxy- and dideoxynucleotides. The incorporation of a dideoxynucleotide results in chain termination, the inability of the enzyme to catalyze further extension of that strand. Electrophoretic separation of reaction products results in a “ladder” of extension products wherein each extension product ends in a particular dideoxynucleotide complementary to the nucleotide opposite it in the template. The distance of the dideoxynucleotide analog from the primer is indicated by the length of the extension product. When four reactions, each containing one of the four dideoxynucleotide analogs ddA, ddC, ddG, or ddT (ddNTPs) are separated on the same gel, the sequence of the template may be read directly from the ladder patterns. Extension products may be detected in several ways, including for example, the inclusion of isotopically- or fluorescently-labeled primers, deoxynucleotide triphosphates or dideoxynucleotide triphosphates in the reaction.
Fluorescent labeling has the advantages of faster data collection, since detection may be performed while the gel is running, and longer reads of sequence data from a single reaction and gel are possible. Further, fluorescent sequence detection has allowed sequencing to be performed in a single reaction tube containing four differentially-labeled fluorescent dye terminators (the so-called dye-terminator method, Lee et al., 1992, Nucleic Acids Res. 20: 2471, incorporated herein by reference).
A desirable quality of a polymerase useful for DNA sequencing is improved incorporation of dideoxynucleotides. Improved incorporation of dideoxynucleotides can make processes such as DNA sequencing more cost effective by reducing the requirement for expensive radioactive or fluorescent dye-labeled dideoxynucleotides. Moreover, unbiased dideoxynucleotide incorporation provides improved signal uniformity, leading to increased accuracy of base determination. The even signal output further allows subtle sequence differences caused by factors like allelic variation to be detected. Allelic variation, which produces two different half strength signals at the position of relevance, can easily be concealed by the varied signal strengths caused by polymerases with non-uniform ddNTP utilization.
Dual-labeled nucleotide analogs (see, e.g., US Patent Publication 20040014096) are nucleotide analogs that have both fluorescent and quenching groups attached, resulting in a molecule that is non-fluorescent before it is incorporated, whereby the fluorescent group is cleaved off of the nucleotide. Dual-labeled nucleotide analogs containing both a fluorescent moiety and a quencher moiety can be used as chain terminators in place of dideoxynucleotide chain terminators commonly used in the art. A chain terminating dual-labeled nucleotide analog has a sugar moiety which is, or is equivalent to a 2′,3′-dideoxypyrofuranose molecule. The dual-labeled nucleotide analogs have the advantage of reduced background fluorescence compared with more traditionally labeled chain terminating nucleotide analogs. Since the dual-labeled nucleotide analogs do not emit a fluorescent signal unless they are incorporated into a polynucleotide chain, background fluorescence resulting from unincorporated analogs is significantly reduced. Dual-labeled nucleotide analogs are also useful for monitoring progress of real time amplification in quantitative PCR (QPCR) methods.
The use of such dual-labeled analogs is limited by the low utilization of such analogs by polymerases. In order to promote incorporation of the analogs into the growing strand, relatively high concentrations of the analogs must be used. The analogs are expensive and decrease the rate of extension, potentially decreasing processivity of the polymerase. High concentrations of the dual-labeled analogs can also result in increased background signal and inter-molecular quenching. A polymerase with reduced discrimination towards dual-labeled nucleotide analogs could result in decreased cost by decreasing the amount of analog required per reaction, while increasing fluorescent signal and sensitivity in both QPCR and sequencing reactions.
Incorporation of ribonucleotides by the native form of DNA polymerase is a rare event. Mutants that incorporate higher levels of ribonucleotides can be used for applications such as sequencing by partial ribosubstitution. In this system, a mixture of ribonucleotides and deoxynucleotides corresponding to the same base are incorporated by the mutant polymerase (Barnes, 1978 J. Mol. Biol. 119:83-99). When the ribosequencing reactions are exposed to alkaline conditions and heat, fragmentation of the extended strand occurs. If the reactions for all four bases are separated on a denaturing acrylamide gel, they produce a sequencing ladder. The applicants of the present patent application have realized that there is a need in the art for polymerase mutants with higher utilization of ribonucleotides for this alternative method of sequencing.
Alternatively, the incorporation of ribonucleotides followed by alkaline hydrolysis can be utilized in a system that requires random cleavage of DNA molecules such as DNA shuffling ((Stemmer, 1994, Nature, 370: 389-391), which has also been called molecular breeding, sexual PCR, and directed evolution.
Another desirable quality in a DNA labeling enzyme is thermal stability. DNA polymerases exhibiting thermal stability have revolutionized many aspects of molecular biology and clinical diagnostics since the development of the polymerase chain reaction (PCR), which uses cycles of thermal denaturation, primer annealing, and enzymatic primer extension to amplify DNA templates. The prototype thermostable DNA polymerase is Taq polymerase, originally isolated from the thermophilic eubacterium Thermus aquaticus. So-called “cycle sequencing” reactions using thermostable DNA polymerases have the advantage of requiring smaller amounts of starting template relative to conventional (i.e., non-cycle) sequencing reactions.
There are three major families of DNA polymerases, termed families A, B, and C. The classification of a polymerase into one of these three families is based on structural similarity of a given polymerase to E. coli DNA polymerase I (Family A), II (Family B) or III (Family C). As examples, Family A DNA polymerases include, but are not limited to Klenow DNA polymerase, Thermus aquaticus DNA polymerase I (Taq polymerase) and bacteriophage T7 DNA polymerase; Family B DNA polymerases, formerly known as α-family polymerases (Braithwaite and Ito, 1991, Nuc. Acids Res. 19:4045), include, but are not limited to human α, δ and ε DNA polymerases, T4, RB69, and Φ29 bacteriophage DNA polymerases, and Pyrococcus furiosus DNA polymerase (Pfu polymerase); and family C DNA polymerases include, but are not limited to Bacillus subtilis DNA polymerase III, and E. coli DNA polymerase III α and ε subunits (listed as products of the dnaE and dnaQ genes, respectively, by Braithwaite and Ito, 1993, Nucleic Acids Res. 21: 787). An alignment of DNA polymerase protein sequences of each family across a broad spectrum of archaeal, bacterial, viral, and eukaryotic organisms is presented in Braithwaite and Ito (1993, supra), which is incorporated herein by reference.
As shown in Braithwaite and Ito (1993, supra), within regions I, II, and III, a set of highly conserved residues form three chemically distinct clusters consisting of exposed, aromatic residues (RB69 numbering, Y416, Y567, and Y391), negatively charged residues (D621, D623, D41 1, D684, and E686), and a positively charged cluster (K560, R481, and K486). Comparison with a Taq polymerase-DNA complex suggests that these three clusters encompass the region in which the primer terminus and the incoming dNTP would be expected to bind. Modeling of the dNTP and primer template complex in RB69 was carried out using the atomic coordinates of the reverse transcriptase c-DNA co-crystal. The model predicts the RB69 Y416 packs under the deoxyribose portion of the dNTP. Tyrosine at this position has been implicated in ribose selectivity, contributing to polymerase discrimination between ribonucleotides and deoxyribonucleotides in mammalian reverse transcriptases (Y115) (Gao et al., 1997, Proc. Natl. Acad. Sci. USA 94:407; Joyce, 1994, Proc. Natl. Acad. Sci. USA 94:1619).
Region III of the Family B polymerases (also referred to as motif B) has also been demonstrated to play a role in nucleotide recognition. This region, which corresponds to AA 487 to 495 of JDF-3 Family B DNA polymerase, has a consensus sequence KX3 NSXYG (SEQ ID NO:1) (Jung et al., 1990, supra; Blasco et al., 1992, supra; Dong et al., 1993, J. Biol. Chem. 268:21163; Zhu et al., 1994, Biochem. Biophys. Acta 1219:260; Dong and Wang, 1995, J. Biol. Chem. 270:2 1563), and is functionally, but not structurally (Wang et al., 1997, supra), analogous to KX3 (F/Y)GX2 YG (SEQ ID NO: 2) in helix O of the Family A DNA polymerases. In Family A DNA polymerases, such as the fragment and Taq DNA polymerases, the O helix contains amino acids that play a major role in dNTP binding (Astatke al., 1998, J. Mol. Biol. 278:147; Astatke et al., 1995, J. Biol. Chem. 270:1945; Polesky et al., 1992, 1. Biol. Chem. 267:8417; Polesky et al., 1990, J. Biol. Chem. 265:14579; Pandey et al., 1994, J. Biol. Chem. 269:13259; Kaushik et al., 1996, Biochem. 35:7256). Specifically, helix O contains the F (F763 in the fragment; F667 in Taq) which confers ddNTP discrimination in Family A DNA polymerases (KX3(F/Y)GX2YG; SEQ ID NO: 2) (Tabor and Richardson, 1995, supra).
The term used to describe the tendency of DNA polymerases to not incorporate unnatural nucleotides into the nascent DNA polymer is “discrimination”. In Family A DNA polymerases, the effective discrimination against incorporation of dideoxynucleotide analogs is largely associated with a single amino acid residue. The majority of enzymes from the Family A DNA polymerases have a phenylalanine (phe or F) residue at the position equivalent to F762 in E. coli fragment of DNA polymerase and demonstrate a strong discrimination against dideoxynucleotides. A few polymerases (e.g. T7 DNA polymerase) have a tyrosine (tyr or Y) residue at the corresponding position and exhibit relatively weak discrimination against dideoxynucleotides. Family A polymerases with tyrosine at this position readily incorporate dideoxynucleotides at levels equal to or only slightly different from the levels at which they incorporate deoxynucleotides. Conversion of the tyrosine or phenylalanine residues in the site responsible for discrimination reverses the dideoxynucleotide discrimination profile of the Family A enzymes (Tabor and Richardson, 1995, Proc. Natl. Acad. Sci. USA 92:6449).
Among the thermostable DNA polymerases, a mutant form of the Family A DNA polymerase from Thermus aquaticus, known as AmpliTaq FS® (Perkin Elmer), contains a F667Y mutation at the position equivalent to F762 of DNA polymerase and exhibits increased dideoxynucleotide uptake (i.e., reduced discrimination against ddNTPs) relative to the wild-type enzyme. The reduced discrimination for dideoxynucleotide uptake makes it more useful for fluorescent and labeled dideoxynucleotide sequencing than the wild-type enzyme.
The F667Y mutant of Taq DNA polymerase is not suited for use with fluorescein-labeled dideoxynucleotides, necessitating the use of rhodamine dye terminators. Rhodamine dye terminators that are currently utilized with Taq sequencing reactions stabilize DNA secondary structure, causing compression of signal. Efforts to eliminate compression problems have resulted in systems that use high amounts of the nucleotide analog deoxyinosine triphosphate (dITP) in place of deoxyguanosine triphosphate. While incorporation of (dITP) reduces the compression of the signal, the presence of dITP in the reaction produces additional complications including lowered reaction temperatures and increased reaction times. Additionally, the use of rhodamine dyes in sequencing requires undesirable post-reaction purification (Brandis, 1999 Nuc. Acid Res. 27:1912). In the Family A E. coli DNA polymerase I fragment, modification of a conserved glutamate residue (E7 10) reduces discrimination against ribonucleotides (Astatke et al., 1998, Proc. Natl. Acad. Sci. USA 96:3402).
In Family A DNA polymerases, such as the Klenow fragment and Taq DNA polymerases, the O helix contains amino acids that play a major role in dNTP binding (Astatke et al., 1998, J. Mol. Biol. 278:147; Astatke et al., 1995, 1 Biol. Chem. 270:1945; Polesky et al., 1992, J. Biol. Chem. 267:84 17; Polesky et al., 1990, J. Biol. Chem. 265:14579; Pandey et al., 1994, J. Biol. Chem. 269:13259; Kaushik et al., 1996, Biochem. 35:7256). Specifically, helix O contains the F (F763 in the Klenow fragment; F667 in Taq) which confers ddNTP discrimination in Family A DNA polymerases (KX3(F/Y)GX2YG; SEQ ID NO: 2) (Tabor and Richardson, 1995, supra).
With the exception of the position of acidic residues involved in catalysis in the so-called palm domain, Family B DNA polymerases exhibit substantially different structure compared to Family A DNA polymerases (Wang et al., 1997, Cell 89:1087; Hopfner et al., 1999, Proc. Natl. Acad. Sci. USA 96:3600). The unique structure of Family B DNA polymerases may permit a completely different spectrum of interactions with nucleotide analogs, perhaps allowing utilization of analogs that are unsuitable for use with Family A DNA polymerases due to structural constraints. Thermostable Family B DNA polymerases have been identified in hyperthermophilic archaea. These organisms grow at temperatures higher than 91° C. and their enzymes demonstrate greater thermostability (Mathur et al., 1992, Stratagies 5:11) than the thermophilic eubacterial Family A DNA polymerases. Alignments of a number of Family B DNA polymerases can be seen in FIGS. 2 and 6.
Structural analysis of A family polymerases, Pol β, HIV reverse transcriptase, and the B family polymerase gp43 demonstrate that all share a functional polymerase structure which resembles a right hand built by the palm, fingers, and thumb domains (see Brautigman and Steitz, 1998, Curr Opin Struc Biol 8:54 for review, incorporated herein by reference). The palm domains show a similar topology among all families, except Pol β. The fingers and thumb domain are highly diverse among the different families, and although the thumb domains are mainly alpha-helical, the detailed structures of the domains are not related. Perhaps surprisingly, the fingers and thumb domains in all four families have arisen from different ancestors.
As polymerases are used for many laboratory applications, a number of polymerases have been developed to have properties that are desirable for a variety of laboratory applications. For example, mutations at sites corresponding to amino acids E141 and D143 in Pyrococcus furiosus (Pfu) (SEQ ID NO: 3) are known to eliminate 3 ‘to 5’ exonuclease activity. Mutations at sites corresponding to amino acids L409, Y410, P411, R461, K465, Q472, A486, R488, L490, A491, N492, Y495, and Y497 are known to reduce nucleotide discrimination in polymerases (see, e.g., U.S. Pat. No. 6,946,273, U.S. Pat. No. 6,333,183, U.S. Pat. No. 5,882,904, U.S. Pat. No. 5,827,716, Yang et al. 1999 Biochemistry 38:8094, Gardner and Jack, 1999 Nucleic Acids Research 27:2545, incorporated herein by reference). A mutation at amino acid V93, specifically V93R, (Pfu numbering) is known to disrupt uracil detection. A non-sequence-specific DNA binding domain, such as the DNA binding domain of Sso7d, can be incorporated into a polymerase to increase the processivity of the polymerase. Moreover, sites corresponding to the amino acids provided in Pfu DNA polymerase can be easily mapped onto other Family B polymerase sequences using published sequence alignments (e.g., Braithwaite and Ito, 1993, supra; Brautigman and Steitz, 1998, supra; and Hopfner et al., 1999, supra; Biles and Connolly, 2004, supra; Gardner and Jack, 1999, supra; Edgell et al., 1997. J. Bacteriol. 179:2632) or any of a number of sequence alignment programs such as BLAST).
Introducing splits into enzymes as a strategy to broaden substrate utilization is very different from currently used approaches, which are based on amino acid replacements. There are four examples of natural splits in the polymerase family. The T4-phage family includes five members that contain splits within the fingers domain (Petrov et al (2006) J Mol Biol. 361:46-68). These splits occur naturally and it is unknown whether the split enzymes exhibit unique characteristics such as broader substrate utilization compared to non-split T4-like phage DNA polymerases. The second natural split is the one reported in the archaeal Methanobacterium thermoautotrophicum DNA polymerase (Kelman et al (99) JBC 274:28751-61). This split also occurs naturally and is found downstream (outside) of the fingers domain. This split has also not been characterized in terms of whether it exhibits broader substrate utilization compared to non-split archaeal DNA polymerases. In the two examples of natural splits, the polymerase fragments are encoded by distinct genes that are separated by anywhere from 2 bp to 3 kb (T4-like phage) to 85 Kbp (Mth) in the genome. The third example of a natural split is in the archaeal DNA polymerase gene. However, this split occurs within a mini-intein of N. equitans DNA polymerase, where the polymerase is expressed as two separate polypeptides, which are then spliced together (trans-splicing) to create a full length polymerase. The split is located outside of the fingers domain and has additional sequence (inteins) to stabilize the protein until the splicing event is complete (Choi et al. (06) J. Mol. Biol. 356:1093-1106) The fourth example of a natural split is found in the archael Sulfolobus solfataricus DNA polymerase B1 (Savino et al. (2004) Structure. 12:2001-2008). In this case, the polymerase is proteolytically cleaved to produce two active fragments, a 50 kD fragment with DNA polymerase activity and a 40 kD fragment with exonuclease activity. However, the authors do not state whether these activities are reduced relative to wild type nor have the proteolytic fragments been tested for alternative or improved activities. The split in this example is also found outside the fingers domain.
Polymerases having reduced discrimination are useful for applications that require incorporation of non-conventional nucleic acids. Such applications include the labeling of nucleic acid arrays, often referred to as nucleic acid or DNA “chips”, in the simultaneous analyses of multiple different nucleic acid sequences. Many of these applications, such as those described in U.S. Pat. No. 5,882,904 (Riedl et al.), will benefit from DNA polymerases exhibiting reduced discrimination against the incorporation of non-conventional nucleotides, particularly fluorescently-labeled non-conventional nucleotides. Applications being addressed in the chip format include DNA sequencing and mutation detection, among others. Examples include the “mini-sequencing” methods (e.g., Pastinen et al., 1997, Genome Res. 7: 606; Syvanen, 1999, Human Mutation 13: 1-10) and the arrayed primer extension (APEX) mutation detection method (Shumaker et al., 1996, Hum. Mutat. 7: 346).
The present applicants have recognized that there is a need in the art for a non-discriminating DNA polymerase for use in chip or gel based mini-sequencing systems. Such a system would advantageously permit detection of multiplexed single nucleotide polymorphisms (SNPs) and allow for quantitative genotyping. Identification of sequence variation permits the diagnosis and treatment of genetic disorders, predisposition to multifactorial diseases, and sensitivity to new or existing pharmaceutical products.
Additionally, the applicants have recognized that there is a need in the art for DNA polymerases with reduced discrimination against unconventional nucleotides. They have realized that there is particularly a need in the art for thermostable DNA polymerases exhibiting reduced discrimination against dideoxynucleotides, and further, for DNA polymerases exhibiting reduced discrimination against fluorescently labeled dideoxynucleotides. They have also recognized that there is a particular need for thermostable DNA polymerases exhibiting reduced discrimination against nucleotide analogs containing modifications in the polyphosphate portion of a nucleotide, especially dual-labeled oligonucleotides.