The genetic material of all known living organisms is deoxyribonucleic acid (DNA), except in certain viruses whose genetic material may be ribonucleic acid (RNA). DNA consists of a chain of individual deoxynucleotides chemically linked in specific sequences. Each deoxynucleotide contains one of the four nitrogenous bases which may be adenine (A), cytosine (C), guanine (G) or thymine (T), and a deoxyribose, which is a pentose, with a hydroxyl group attached to its 3xe2x80x2 position and a phosphate group attached to its 5xe2x80x2 position. The contiguous deoxynucleotides that form the DNA chain are connected to each other by a phosphodiester bond linking the 5xe2x80x2 position of one pentose ring to the 3xe2x80x2 position of the next pentose ring in such a manner that the beginning of the DNA molecule always has a phosphate group attached to the 5xe2x80x2 carbon of a deoxyribose. The end of the DNA molecule always has an OH (hydroxyl) group on the 3xe2x80x2 carbon of a deoxyribose.
DNA usually exists as a double-stranded molecule in which two antiparallel DNA strands are held together by hydrogen bonds between the bases of the individual nucleotides of the two DNA strands in a strictly matched xe2x80x9cA-Txe2x80x9d and xe2x80x9cC-Gxe2x80x9d pairing manner. It is the order or sequence of the bases in a strand of DNA that determines a gene which in turn determines the type of protein to be synthesized. Therefore, the accurate determination of the sequence of the bases in a DNA strand which also constitutes the genetic code for a protein is of fundamental importance in understanding the characteristics of the protein concerned.
The process used to determine the sequence of the bases in a DNA molecule is referred to as DNA sequencing. Among the techniques of DNA sequencing, the enzymatic method developed by Sanger et al. (1) is most popular. It is based on the ability of a DNA polymerase to extend a primer annealed to the DNA template to be sequenced in the presence of four normal deoxynucleotide triphosphates (dNTPs), namely, dATP, dCTP, dGTP and dTTP, and on the ability of the nucleotide analogs, the dideoxynucleotide triphosphates (ddNTPs), namely, ddATP, ddCTP, ddGTP and ddTTP, to terminate the extension of the elongating deoxynucleotide polymers at various lengths.
In the classic one-step Sanger method, the sequence determination is carried out in a set of four separate tubes, each containing all four normal dNTPs, one of which is labeled with a radioactive isotope, 32P or 35S, for autoradiographic localization, a limiting amount of one of the four ddNTPs, a DNA polymerase, a primer, and the DNA template to be sequenced. As a result of the DNA polymerase activity, individual nucleotides or nucleotide analogs are added to the new DNA chains, all starting from the 3xe2x80x2 end of the primer in a 5xe2x80x2-3xe2x80x2 direction, and each linked to adjacent ones with a phosphodiester bond in a base sequence complementary to the DNA sequence of the template. Inasmuch as there is a nucleotide analog in the reaction mixture, each tube eventually contains numerous newly formed DNA strands of various lengths, all ending in a particular ddNTP, referred to as A, C, G or T terminator.
After resolving the four sets of reaction products by high-resolution polyacrylamide/urea gel electrophoresis, the populations of the newly formed DNA strands are separated and grouped according to their molecular weight. An autoradiographic image of the gel will show the relative positions of these DNA strands as bands which differ from one another in distance measured by one nucleotide in length, all sharing an identical primer and terminating with a particular ddNTP (A, C ,G or T). By reading the relative positions of these bands in the xe2x80x9cladderxe2x80x9d of the autoradiograph, the DNA sequence of the template can be deduced.
The DNA polymerase used in the reaction mixture plays a pivotal role in DNA sequencing analysis. To be useful for DNA sequencing, a DNA polymerase must possess certain essential properties. For example, it must have its natural 5xe2x80x2-3xe2x80x2 exonuclease activity removed by mutagenesis or by posttranslational modification, such as enzymatic digestion, and must be able to incorporate dNTPs and ddNTPs, without undue discrimination against ddNTP and with a sufficiently high processivity which refers to the ability of the enzyme to polymerize nucleotides onto a DNA chain continuously without being dislodged from the chain, and a sufficiently high elongation rate. A 5xe2x80x2-3xe2x80x2 exonuclease activity associated with a DNA polymerase will remove nucleotides from the primer, thus cause a heterogeneous 5xe2x80x2 end for the newly formed DNA strands, resulting in a false reading of the strand lengths on the sequencing gel. A DNA polymerase with a low processivity and a low elongation rate will cause many undesirable noise background bands of radioactivity due to the presence of DNA strands which are formed with improper lengths and improper terminations. Among the more commonly used DNA polymerases, Sequenase(trademark) has a higher processivity and a higher elongation rate than others, such as the Klenow fragment, Taq, and Vent polymerases (2), and is therefore one of the most popular DNA polymerase selected for DNA sequencing to-date.
However, even when a DNA polymerase has been endowed with all the essential properties listed above, it may still generate erroneous or misleading band patterns of radioactivity in the sequencing gel. These artifactual patterns do not faithfully reflect the true nucleotide sequence in the template being sequenced. They may be caused by premature termination of the elongating strands due to the presence of secondary structures formed along the template, such as xe2x80x9chairpinsxe2x80x9d in the regions that contain palindromic sequences or that are rich in G and C bases (3); or, they may occur as a result of inadequate xe2x80x9cproof-readingxe2x80x9d function of the DNA polymerase that will allow the removal of misincorporated nucleotides at the 3xe2x80x2 end of an elongating strand.
Researchers in the field of DNA sequencing often have to use several approaches to confirm their findings in order to avoid being misled by these potentially erroneous sequence data. For example, they sometimes rely on repeating the same sequencing experiment with different DNA polymerases, or performing another sequencing reaction with the template which is complementary to the first single-stranded DNA template, and compare the results for possible discrepancies.
Numerous investigators have tried to find an ideal DNA polymerase for enzymatic sequencing, i.e. an enzyme that not only has all the essential properties required for sequencing reaction, but also is capable of resolving the secondary hairpin structures and preventing the formation of strands containing nucleotides non-complementary to those of the template being sequenced.
The discovery by Ye and Hong (4) of the thermostable large fragment of DNA polymerase isolated from Bacillus stearothermophilus (Bst), an enzyme that is functional over the temperature range between 25xc2x0 C. and 75xc2x0 C., but is most active at 65xc2x0 C., and possesses all the essential properties for DNA sequencing, has largely solved the problem caused by secondary structures in the template since these secondary structures are destabilized when the sequencing reaction is carried out at 65xc2x0 C. In the past few years since this enzyme was made commercially available under the name of Bst DNA Polymerase (Bio-Rad Laboratories), independent reports have confirmed that during sequencing reaction catalyzed by this enzyme all four dNTPs, including dCTP, and other nucleotide analogs, such as dITP and 7-deaza-dGTP, are incorporated equally effectively in the chain elongation, thus eliminating the weak xe2x80x9cCxe2x80x9d band phenomena often observed when other DNA polymerases are used, and producing a very good band uniformity on the sequencing gel. It has been further established that at this elevated temperature Bst DNA polymerase system can be used both for the classic Sanger one-step reaction as well as for the xe2x80x9clabeling/terminationxe2x80x9d sequencing reaction, double-stranded DNA sequencing, and the incorporation of 35S-labeled nucleotides, and 32P-labeled nucleotides. Since this system can be placed at room temperature for at least two weeks without significant loss of its enzymatic activity, it has been adapted for automation of DNA sequencing which requires a stable DNA polymerase, using either fluorescent dye or radioactive isotope labeling. (See also 9, 12, and 13.)
However, when this Bst enzyme is used for automated fluorescent DNA sequencing, only partially satisfactory results have been obtained with fluorescent dye-labeled primers (see 12 and EG Bulletin 1771 of Bio-Rad Laboratories), and even less satisfactory results are obtained with fluorescent dye-labeled ddNTP terminators. Even when fluorescent dye-labeled primers are used, a significant number of mismatched ddNTPs are incorporated onto the 3xe2x80x2 end of the extending nucleotides in the enzymatic reaction, thus generating erroneous sequencing data (see Bio-Rad EG Bulletin 1771). With this in mind, the inventors sought, and found, a better DNA polymerase for DNA sequencing, especially for automated fluorescent dye-labeled primer and fluorescent dye-labeled terminator sequencing.
Another disadvantage of the Bst DNA polymerase currently known in the art is its lack of 3xe2x80x2-5xe2x80x2 exonuclease activity (5), and specifically, proof-reading 3xe2x80x2-5xe2x80x2 exonuclease activity. A survey of the sequencing data collected from fourteen research centers which have used this Bst DNA polymerase for their DNA sequencing work on over 120 DNA clones showed that, statistically, base pair mismatching occurs at a rate of about 1.5xc3x9710xe2x88x925. That is, approximately 1.5 errors can be expected in one hundred thousand nucleotide incorporations during nucleotide polymerization catalyzed by the enzyme.
It is generally known that the formation of incorrect DNA sequences due to mismatching of base pairs between the template and the growing nucleotide chain in DNA sequencing may be prevented by a 3xe2x80x2-5xe2x80x2 exonuclease activity which xe2x80x9cproof-readsxe2x80x9d the nucleotide chain. However, even if a DNA polymerase exhibits 3xe2x80x2-5xe2x80x2 exonuclease activity in vitro, it is often the case that the polymerase will not adequately xe2x80x9cproof-readxe2x80x9d. Thus, the polymerase will not be capable of removing mismatched nucleotides from a newly formed DNA strand as efficiently as those nucleotides correctly matched with the nucleotides of the template. In other words, a 3xe2x80x2-5xe2x80x2 exonuclease may excise the correctly matched nucleotides at a faster rate than the mismatched ones from the 3xe2x80x2 terminus, or excise both the correctly matched and the mismatched nucleotides at the same rate. Consequently, even where the DNA polymerase has 3xe2x80x2-5xe2x80x2 exonuclease activity, it does not perform any useful proof-reading function during DNA polymerization.
It is also known that a 3xe2x80x2-5xe2x80x2 exonuclease activity associated with a DNA polymerase, in the presence of low concentrations of dNTPs, often counteracts the normal chain elongation process catalyzed by the polymerase, induces cyclic incorporation and degradation of nucleotides over the same segment of template, or even operates more efficiently than the polymerase activity per se, to the extent of causing degradation of the primer. Consequently, removal of the 3xe2x80x2-5xe2x80x2 exonuclease activity along with the 5xe2x80x2-3xe2x80x2 exonuclease activity from the native DNA polymerases by chemical means or by genetic engineering techniques has become a standard procedure in producing DNA polymerases for sequencing. This is a common strategy to preserve the essential properties of a DNA polymerase.
For example, among the major commercially available sequencing enzymes (other than the native Taq (Thermus aquaticus) DNA polymerase which lacks a 3xe2x80x2-5xe2x80x2 exonuclease activity de novo) the 3xe2x80x2-5xe2x80x2 exonuclease activity has been removed from the native T7 DNA polymerase, which lacks a 5xe2x80x2-3xe2x80x2 exonuclease, either by a chemical reaction that oxidizes the amino acid residues essential for the exonuclease activity (Sequenase(trademark) Version 1) or genetically by deleting 28 amino acids essential for the 3xe2x80x2-5xe2x80x2 exonuclease activity (Sequenase(trademark) 2).
VentR(exoxe2x88x92) DNA polymerase, which is recommended as the preferred form of the Vent DNA polymerase for sequencing, also has its 3xe2x80x2-5xe2x80x2 exonuclease activity removed by genetic modification. The native Vent DNA polymerase and the Klenow fragment isolated from the native E. coli DNA polymerase I possess a 3xe2x80x2-5xe2x80x2 exonuclease; but these enzymes are no longer considered the enzymes of choice for DNA sequencing.
The currently known Bst DNA polymerase (e.g., produced by Bio-Rad Laboratories) isolated and purified from the cells of Bacillus stearothermophilus for DNA sequencing is free of 3xe2x80x2-5xe2x80x2 exonuclease activity (5).
IsoTherm(trademark) DNA Polymerase, a commercially available Bst DNA polymerase for DNA sequencing, marketed by Epicentre Technologies (1402 Emil Street, Madison, Wis. 53713), is also based on a Bst DNA polymerase whose 3xe2x80x2-5xe2x80x2 exonuclease activity has been enzymatically removed (6).
Only the rBst DNA Polymerase produced from an over-expressing recombinant clone in E. coli, which is the product of the DNA pol I gene of Bacillus stearothermophilus, possesses a 3xe2x80x2-5xe2x80x2 exonuclease activity in addition to a 5xe2x80x2-3xe2x80x2 exonuclease activity. However, due to the existence of an undesirable 5xe2x80x2-3xe2x80x2 exonuclease activity and a 3xe2x80x2-5xe2x80x2 exonuclease activity of unknown characteristics, the latter product is not recommended by the company for DNA sequencing (6).
Over the past 10 years there has been a trend to develop and improve the automated fluorescent DNA sequencing technology to replace the classic radioactive isotope labeling manual method for DNA sequencing because of the potential harmful effects of the radioactive materials to humans and because of the need for automated high throughput DNA sequencing systems. In using fluorescent dyes as markers for labeling the DNA strands generated in enzymatic reactions for sequencing, the dyes can be either coupled with the primer, or coupled with the ddNTP terminators, namely the dye-labeled ddATP, dye-labeled ddCTP, dye-labeled ddGTP and dye-labeled ddTTP. Sequencing techniques based on these two forms of labeling of the final enzymatic reaction products are commonly referred to as xe2x80x9cdye primer sequencingxe2x80x9d and xe2x80x9cdye terminator sequencingxe2x80x9d, respectively.
In the dye primer sequencing, ddNTPs are employed as the chain terminators, as in the original classic Sanger method which uses radioactive isotope as the marker. The molecular structure of ddNTPs are almost identical to that of dNTPs, the natural building blocks of all DNA molecules. Therefore, any DNA polymerase which has been used for radioactive isotope manual DNA sequencing can be easily adapted for fluorescent dye primer DNA sequencing with equally satisfactory results. The disadvantage in the dye primer technology is that the primer for each template to be sequenced must be labeled with four different fluorescent dyes and that the enzymatic reaction must be performed in four separate test tubes each containing only one of the ddNTPs, namely ddATP, ddCTP, ddGTP or ddTTP, as in the classic Sanger radioisotope method.
In the dye terminator technology for DNA sequencing, the fluorescent dye-labeled ddATP, dye-labeled ddCTP, dye-labeled ddGTP and dye-labeled ddTTP are coupled with different fluorescent dyes, each emitting a specific light spectrum, thus directly reporting the type of ddNTP at the 3xe2x80x2 terminus of the DNA fragment. Unlike the situations in the dye primer technology in which four different fluorescent dyes are coupled to a primer incorporated into all newly formed DNA strands, these dye-labeled ddNTPs serve the dual function of a specific base terminator and a xe2x80x9ccolor markerxe2x80x9d. There is no need to label the primer for each new template, and the polymerase DNA extension reaction can be performed in a single test tube to generate the required specifically terminated and specifically dye-labeled DNA fragments of various sizes for DNA sequencing.
The advantage of using fluorescent dye-labeled terminators for DNA sequencing is obvious. However, there are certain difficulties to overcome before an enzymatic reaction system suitable for a radioisotope technique or suitable for a dye primer technique can be adapted for a dye terminator technology. An increase of the molecular weight from less than 500 for a ddNTP terminator to about 800 or more for a fluorescent dye-labeled ddNTP terminator may be associated with potential three-dimensional structural changes. These molecular alterations may interfere with the process of incorporation of the dye-labeled ddNTPs as chain terminators by the DNA polymerase to the 3xe2x80x2 end of an extending DNA strand in terms of lowering the rate of incorporation, lowering the processivity of the enzyme for this new substrate, reducing the enzyme-terminator binding specificity and changing the enzyme-terminator binding kinetics.
For example, both Taq DNA polymerase and Sequenase II(trademark) (a T7 DNA polymerase) have been used for radioisotope labeling DNA sequencing with excellent results, and have been adapted for fluorescent dye-labeled primer DNA sequencing. But neither can be used for fluorescent dye-labeled terminator DNA sequencing technologies. As reported in U.S. Pat. No. 5,614,365, when the Taq DNA polymerase was used for fluorescent dye-labeled terminator chemical reactions, the reaction products generated no readable data on the DNA sequencer. Most of the fluorescence was either in unincorporated dye-ddNTPs at the leading front of the test gel, or in fragments greater than several hundred bases in length. Using a Taq DNA polymerase mutant in which the amino acid, phenylalanine, at position 667 of its amino acid sequence has been replaced by a tyrosine and which has an increased ability to incorporate dideoxynucleotides (6,000 times more efficient), to replace the unmodified Taq DNA polymerase for the experiment, the results are significantly improved. This F667Y mutant of Taq DNA polymerase is now marketed by Amersham Life Science, Inc. under the trademark ThermoSequenase(trademark). It is used for cycle-sequencing in which the enzymatic reaction mixture is subjected to numerous cycles of extension-termination, denaturing and annealing to ensure that sufficient dye-terminator-labeled enzymatic reaction products are generated for the DNA sequencing procedure. Because of the low processivity of the parent Taq DNA polymerase, ThermoSequenase(trademark) is not recommended for direct DNA sequencing without precyclings. Like Taq DNA polymerase, ThermoSequenase(trademark) lacks a proof-reading exonuclease activity.
Bacillus stearothermophilus, Bacillus caldotenax and Bacillus caldolyticus are classified as mesophilic microbes; although their DNA polymerases are referred to as thermostable (most active at 65xc2x0 C.) they are inactivated at 70xc2x0 C. or above. This is contrasted with other enzymes, such as Taq, which are truly thermophilicxe2x80x94that is, their DNA polymerases tolerate and remain active at temperatures higher that 95xc2x0 C. These mesophilic bacillus strains, especially Bacillus stearothermophilus, produce DNA polymerases that are useful in DNA sequencing applications. However, a disadvantage of the DNA polymerases of these strains is that during DNA sequencing they all exhibit a high degree of selective discrimination against incorporation of certain particular members of fluorescent dye-labeled ddNTPs, namely the fluorescent dye-labeled ddCTP and fluorescent dye-labeled ddATP, as terminators onto the 3xe2x80x2 end of the extending DNA fragments during enzymatic reaction. This peculiar characteristic of selective discrimination against incorporation of fluorescent dye-labeled ddCTP and ddATP of the natural DNA polymerases isolated from Bacillus stearothermophilus and Bacillus caldotenax was not previously recognized. Such selective discrimination is apparently sequence-related, and cannot be corrected or compensated by mere adjustment of the concentrations of the dNTPs.
Thus, there is a need for a mesophilic bacillus DNA polymerase that does not selectively discriminate against incorporation of fluorescent dye-labeled ddCTP and ddATP, during dye primer or dye terminator DNA sequencing.
This invention addresses the above-described problems associated with mesophilic bacillus DNA polymerases by providing novel DNA polymerases which, during direct DNA sequencing, reduce the innate selective discrimination against the incorporation of fluorescent dye-labeled ddCTP and fluorescent dye-labeled ddATP, without increasing the rate of incorporation of the other two dye-labeled ddNTP terminators (ddTTP and ddGTP) excessively. In particular, this invention provides a novel genetic modification of the amino acid sequence of a highly processive DNA polymerase (such as isolated from Bacillus stearothermophilus, Bacillus caldotenax or Bacillus caldolyticus) that, unmodified, selectively discriminates against incorporation of fluorescent dye-labeled dideoxynucleotide terminators ddATP and ddCTP (but does not discriminate against incorporation of fluorescent dye-labeled dideoxynucleotide terminators ddTTP and ddGTP). The modification results in a reduction of the innate selective discrimination against incorporation of fluorescent dye-labeled dideoxynucleotide terminators ddATP and ddCTP, such that all four of the ddNTP terminators are effectively incorporated into the DNA primer elongated by the DNA polymerase. Thus, the modified DNA polymerase of this invention is effective in reducing the innate selective discrimination against incorporation of fluorescent dye-labeled dideoxynucleotide terminators ddATP and ddCTP characteristic of the DNA polymerase in its unmodified state.
In particular, the preferred DNA polymerase is a modification of a DNA polymerase isolated from a strain of a mesophilic bacterium, such as Bacillus stearothermophilus, Bacillus caldotenax or Bacillus caldolyticus. The approach of modifying the DNA polymerase described herein may be used to modify other DNA polymerases which share a close amino acid homology of a DNA polymerase isolated from a strain Bacillus stearothermophilus, Bacillus caldotenax or Bacillus caldolyticus, as long as the unmodified DNA polymerases have a selective discrimination against incorporation of fluorescent dye-labeled dideoxynucleotide ddCTP and/or ddATP as terminators in the enzymatic reaction for preparing materials for automated fluroescent DNA sequencing. Consequently, it is preferred that the modified DNA polymerase has an amino acid sequence that shares not less than 95% homology of a DNA polymerase isolated from a strain of Bacillus stearothermophilus, Bacillus caldotenax or Bacillus caldolyticus. 
The particularly preferred mesophilic species is Bacillus stearothermophilus, which is highly heterogeneous. This is indicated by the wide range of DNA base compositions as well as the range of the phenotypic properties of strains assigned to this species (see Bergey""s Manual of Systemic Bacteriology, Eds. P. H. A. Sneath, N. S. Mair, M. E. Sharpe and J. G. Holt, Williams and Wilkins, 1986, Vol. 2, page 1135). Therefore, it is reasonable to assume that the amino acid sequences of DNA polymerases isolated from various strains would be heterogeneous with potential functional differences. Although DNA polymerases isolated from the known standard strains of Bacillus stearothermophilus have been shown to lack a 3xe2x80x2-5xe2x80x2 exonuclease activity, a questionable trace of xe2x80x9ccontaminatingxe2x80x9d 3xe2x80x25xe2x80x2 exonuclease has been observed in a purified DNA polymerase preparations (see Kaboev et al., J. Bacteriology, Vol. 145, page 21-26, 1981).
Consequently, the inventors began to address the above-identified problems in the art by discovering a strain of Bacillus stearothermophilus (designated strain No. 320 for identification purposes; described in U.S. Pat. No. 5,747,298) that produces a DNA polymerase (designated Bst 320) with a proof-reading 3xe2x80x2-5xe2x80x2 exonuclease activity which is absent in DNA polymerases isolated from other strains of Bacillus stearothermophilus. (For this invention, the term xe2x80x9cproof-readingxe2x80x9d is intended to denote that the DNA polymerase is capable of removing mismatched nucleotides from the 3xe2x80x2 terminus of a newly formed DNA strand at a faster rate than the rate at which nucleotides correctly matched with the nucleotides of the template are removed during DNA sequencing.) The strain Bst 320 was deposited on Oct. 30, 1995 in the American Type Culture Collection, located at 12301 Parklawn Drive, Rockville, Md. 20852, and has been given ATCC Designation No. 55719. The DNA polymerase isolated from Bst 320 is composed of 587 amino acids as are the DNA polymerases of other known strains of Bacillus stearothermophilus, such as, for instance, the strains deposited by Riggs et al (Genbank Accession No. L42111) and by Phang et al. (Genbank Accession No. U22149). However, the Bst 320 shares only 89.1% sequence identity at protein level with the Bacillus stearothermophilus DNA polymerase deposited by Riggs et al., and shares only 87.4% sequence identity at protein level with the Bacillus stearothermophilus DNA polymerase deposited by Phang et al. For comparison, the above-referenced enzyme deposited by Riggs et al. and the enzyme deposited by Phang et al. share 96.9% of their amino acid sequence identity.
The inventors studied a thermostable DNA polymerase isolated from a different species, Bacillus caldotenax (Bca), which also has an optimum active temperature at 65xc2x0 C. The inventors discovered that the Bst 320 DNA polymerase shares 88.4% of the amino acid sequence identity with Bca DNA polymerase (Uemori et al. J. Riochem. 113: 401-410, 1993). Based on homology of the amino acid sequences, Bst 320 DNA polymerase is as close to DNA polymerases isolated from Bacillus stearothermophilus as to the DNA polymerase isolated from Bacillus caldotenax, i.e. another species of bacillus. It was also discovered that both Bst 320 DNA polymerase and Bca DNA polymerase functionally exhibit 3xe2x80x2-5xe2x80x2 exonuclease activity, which is not associated with known amino acid sequence exonuclease motifs I, II and III as in the E. coli DNA polymerase I model, or other known Bacillus stearothermophilus polymerases.
The inventors has studied the DNA polymerases of three different strains of Bacillus stearothermophilus (including DNA polymerase obtained from Bst 320) and the DNA polymerase of Bacillus caldotenax and found that they all exhibit a high degree of selective discrimination against incorporation of certain particular members of fluorescent dye-labeled ddNTPs, namely the fluorescent dye-labeled ddCTP and fluorescent dye-labeled ddATP, as terminators onto the 3xe2x80x2 end of the extending DNA fragments during enzymatic reaction. This is especially the case when the preceding 3xe2x80x2 end base of the extending DNA fragment is a dGMP (G) or a dAMP (A). (By xe2x80x9cdNTPxe2x80x9d it is intended to denote the four commonly known deoxynucleotide triphosphates, dATP, dTTP, dCTP, and dGTP.)
This selective discrimination causes missing peaks and ambiguous peaks on a color plot generated by the automated fluorescent DNA sequencer, and causes loss of sequencing data and erroneous base callings. This is shown in FIGS. 6 and 8.
This disadvantage of the natural bacillus DNA polymerases in fluorescent dye-labeled terminator DNA sequencing cannot be corrected or compensated by mere adjustment of the concentrations of the dNTPs and the fluorescent dye-labeled ddNTPs in the reaction mixture. This selective discrimination against the specific dye-labeled ddNTPs is also sequence-related as demonstrated with respect to Bst in FIGS. 6 and 8, in which the missing or ambiguous xe2x80x9cCxe2x80x9d peaks and xe2x80x9cAxe2x80x9d peaks tend to occur immediately following a preceding xe2x80x9cGxe2x80x9d peak or a preceding xe2x80x9cAxe2x80x9d peak. Of particular interest is the fact that the xe2x80x9cCxe2x80x9d and xe2x80x9cAxe2x80x9d peaks immediately following a preceding xe2x80x9cCxe2x80x9d or a preceding xe2x80x9cTxe2x80x9d peak are quite strong and resolvable in the same color plot analysis, indicating that the concentrations of dNTPs and the fluorescent dye-labeled ddCTP and the fluorescent dye-labeled ddATP were adequate for the termination reaction.
According to the structural model studies carried out on E. coli DNA polymerase I (Klenow fragment), certain amino acids in a particular region or regions of a DNA polymerase appear to play important roles in dNTP and ddNTP bindings and their final incorporation, and affect discrimination between deoxy and dideoxynucleotide substrates. For example, mutation of the amino acids arginine, asparagine, lysine, tyrosine, phenylalanine, aspartate, and glutamate in certain locations of amino acid sequences of Klenow fragment may affect the binding of dNTP and discrimination between deoxy and dideoxynucleotides. (See: Joyce, C. M., Current Opinion in Structural Biology, 1:123-129, 1991. Joyce and Steitz, Annu. Rev. Biochem., 63:777-822, 1993, page 800. Carrol et al., Biochemistry 30:804-813, 1991).
The problem which faced the inventors was how to reduce the selective discrimination against the incorporation of fluorescent dye-labeled ddCTP and fluorescent dye-labeled ddATP by site-directed mutagenesis of a DNA polymerase, without increasing the rate of incorporation of the other two dye-labeled ddNTP terminators excessively. In particular, the new mutant must be able to incorporate more correctly base-matched dye-labeled ddCTP and/or dye-labeled ddATP terminators to the dGMP (G) and dAMP (A) bases, than to the dCMP (C) and dTMP (T) bases of the extending DNA fragments during enzymatic reaction. A blanket increase in the ability of an enzyme to incorporate all four dye-labeled ddNTPs to the same proportion would serve no useful purpose for the group of DNA polymerases isolated from mesophilic bacilli since, unlike the Taq DNA polymerase, the unmodified natural enzymes of Bacillus stearothermophilus and Bacillus caldotenax already possess a high ability to incorporate fluorescent dye-labeled ddGTP and fluorescent dye-labeled ddTTP, and even the fluorescent dye-labeled ddCTP and dye-labeled ddATP provided at the immediately preceding base at the 3xe2x80x2end of the extending DNA fragment is not a xe2x80x9cGxe2x80x9d or an xe2x80x9cAxe2x80x9d.
The inventors found that DNA polymerases isolated from strains of Bacillus stearothermophilus and Bacillus caldotenax possess the same amino acids at certain specific positions in their amino acid sequence. For example, they all have leucine-glutamate-glutamate at positions corresponding to positions 342-344 and phenylalanine at a position corresponding to position 422 of the amino acid sequence of the DNA polymerase isolated from No 320 strain of Bacillus stearothermophilus. The inventors further discovered that the most optimal modification to solve the problem of selective discrimination in direct fluorescent DNA sequencing for these DNA polymerases is to modify the four amino acids of the natural DNA polymerases referenced above in such a form that threonine-proline-leucine substitute respectively for leucine-glutamate-glutamate at positions 342-344 and tyrosine substitutes for phenylalanine at position 422 in their amino acid sequences. Accordingly, the nucleotide sequence encoding the natural forms of the DNA polymerases are modified at positions 1024-1032 from CTCGAAGAG to ACCCCACTG and at position 1265 from T to A to encode for the DNA polymerases having the desired properties. The combined effects of these amino acid modifications reduce the selective discrimination against incorporation of fluorescent dye-labeled ddCTP and dye-labeled ddATP of the naturally-occurring mesophilic bacillus DNA polymerases during enzymatic reaction for direct automated fluorescent DNA sequencing.
Initially, the DNA polymerases used in the inventors"" research were obtained by overexpression of the genes encoding the naturally-occurring enzymes of Bacillus stearothermophilus and Bacillus caldotenax. Subsequently, modified DNA polymerases obtained by overexpression of the site-directed mutated genes were used. This invention provides both the nucleotide and amino acid sequence for a modified DNA polymerase to illustrate the practice of this new approach of modifying a special group of DNA polymerases, as described below.
In one preferred embodiment, the Bst 320 DNA polymerase is used for the unmodified, naturally-occurring DNA polymerase, although DNA polymerases isolated from other strains of mesophilic bacilli (for instance, Bacillus stearothermophilus and Bacillus caldotenax) can be used as the starting enzymes for the genetic modification. As noted above, the Bst 320 DNA polymerase is also capable of proofreading 3xe2x80x2-5xe2x80x2 exonuclease activity. In particular, the invention provides the DNA and amino acid sequences for the isolated and purified DNA polymerase having this function. These sequences are also described below.
The invention also contemplates an isolated strain of Bacillus stearothermophilus which produces a DNA polymerase having an ability to reduce selective discrimination against incorporation of fluorescent dye-labeled dideoxynucleotide terminators ddCTP and ddATP, but not fluorescent dye-labeled dideoxynucleotide terminators ddGTP and ddTTP, in the presence of dNTPs and the four fluorescent dye-labeled dideoxynucleotide terminators. Preferably, the Bst strain produces a DNA polymerase which also has proofreading 3xe2x80x2-5xe2x80x2 exonuclease activity during DNA sequencing of a DNA strand from a template.
As mentioned above, the invention also contemplates DNA polymerases obtained or otherwise derived from any bacillus strain, or made synthetically, as long as the amino acid sequences of the naturally-occurring DNA polymerases have leucine-glutamate-glutamate at positions corresponding respectively to positions 342-344 of Bst 320 DNA polymerase and phenylalanine at a position corresponding to position 422 of Bst 320 DNA polymerase. For example, DNA polymerases derived from other strains of Bacillus stearothermophilus or Bacillus caldotenax or other mesophilic bacilli may be easily modified using conventional DNA modification techniques to include the amino acid or nucleotide substitutions identified above.
The invention also provides a DNA construct comprising at least one of the above-described DNA polymerase sequences and a vector (such as a cloning vector or an expression vector), for introducing the DNA construct into eucaryotic or procaryotic host cells (such as an E. coli host cell). In addition, the invention further provides a host cell stably transformed with the DNA construct in a manner allowing production of the peptide encoded by the DNA segment in the construct.
The invention also provides improved methods for replicating DNA and sequencing DNA using the above-described DNA polymerases of the invention. The DNA polymerases are useful in both direct dye terminator DNA sequencing and dye-primer DNA sequencing.
Preferably, the method of sequencing a DNA strand may comprise the steps of:
i) hybridizing a primer to a DNA template to be sequenced;
ii) extending the primer using a DNA polymerase which has an ability to reduce selective discrimination against incorporation of fluorescent dye-labeled dideoxynucleotide terminators ddCTP and ddATP, in the presence of adequate amounts of nucleotide bases dATP, dGTP, dCTP and dTTP, or their analogs, and the four fluorescent dye-labeled dideoxynucleotide terminators,
under such conditions that the DNA strand is sequenced.
Further objects and advantages of the invention will become apparent from the description and examples below.