The present invention relates to methods for determining DNA sequences. In particular, the present invention relates to methods for determining DNA sequence utilizing mutant RNA polymerases. The present invention further relates to methods for determining DNA sequences utilizing novel 3xe2x80x2-deoxyribonucleotide derivatives as fluorescence-labeled terminators. The present invention still further relates to methods for determining DNA sequences wherein nucleic acid transcription reaction is performed in the presence of inorganic pyrophosphatase.
The polymerase chain reaction (PCR) method is an excellent method, and its utilization has expanded year by year [Randall K. Saiki et al. (1988) Science 239, 487-491]. In the PCR method, even one molecule of DNA fragment can be amplified. The method for sequencing PCR amplified products without cloning them (the direct sequencing method) is also a useful method [Corinne Wong et al. (1988) Nature, 330, 384-386]. This technique does not require construction of libraries and screening of such libraries, and is a quick method capable of simultaneously obtaining sequence information of many samples.
However, the above direct sequencing method suffers from two major problems.
One is that primers and 2xe2x80x2-deoxyribonucleoside 5xe2x80x2-triphosphates (2xe2x80x2-dNTPs) not incorporated remain in a reaction system, and the remained substances inhibit sequencing reactions. Therefore, in conventional methods, such primers and2xe2x80x2-dNTPs must be removed from PCR products before sequencing. There are many methods for purification of PCR products and examples include purification by electrophoresis, ethanol precipitation, gel filtration and HPLC purification [see, for example, Dorit R. L. et al. (1991) Current Protocols in Molecular Biology, Vol. 11, John Wiley and Sons, New York, 15.2.1-15.2.11]. However, these methods are complicated without exception.
The second problem is quick renaturation of PCR products. When the PCR products are renatured into a double-stranded DNA, they are no longer single-stranded templates, and annealing between primers and single-stranded templates is inhibited. As methods for minimizing the renaturation, quenching after denaturation, biotilation of one primer and absorption of PCR products onto streptavidin-coated articles, use of exonuclease, asymmetric PCR and the like have been reported. See, for example, Barbara Bachmann et al., 1990, Nucleic Acid Res., 18, 1309-. However, most of these methods are time-consuming and very laborious.
Therefore, the present inventors proposed an absolutely novel method for determining nucleotide sequence of DNA for solving these problems. Which does not require removal of unreacted primers and 2xe2x80x2-deoxyribonucleoside 5xe2x80x2-triphosphates (2xe2x80x2-dNTPs) remaining in the PCR reaction system, and does not require denaturation at all. This method enables to eliminate the problem of quick renaturation of PCR reaction products [WO96/14434]. This method is a direct transcriptional sequencing method utilizing an RNA polymerase such as T7 RNA polymerase and a terminator for RNA transcription reaction (for example, 3xe2x80x2-deoxyribonucleoside 5xe2x80x2-triphosphates, 3xe2x80x2-dNTPs). According to this method, nucleotide sequences of DNA products amplified by the polymerase chain reaction can be used as they are for sequencing without removing primers and 2xe2x80x2-deoxyribonucleoside 5xe2x80x2-triphosphates (2xe2x80x2-dNTPs). In addition, because it does not require denaturation itself at all, it can avoid the problem of quick renaturation of PCR products, and hence is an extremely excellent method.
However, the present inventors further studied the above method, and found that it has a problem to be solved in order to obtain more accurate nucleotide sequence data.
In the above nucleotide sequence determination method, an RNA polymerase such as T7 RNA polymerase is used for the reaction in a mixture comprising ribonucleoside 5xe2x80x2-triphosphates including ATP, GTP, CTP, UTP and derivatives thereof, and at least one 3xe2x80x2-deoxyribonucleotide such as 3xe2x80x2-dATP, 3xe2x80x2-dGTP, 3xe2x80x2-dCTP, 3xe2x80x2-dUTP and derivatives thereof. In this reaction, polyribonucleotides are synthesized by sequential incorporation of ribonucleotides and deoxyribonucleotides into a ribonucleotide sequence in a manner corresponding to the sequence of templates.
However, it was found that 3xe2x80x2-deoxyribonucleotides and derivative thereof are unlikely to be incorporated into the sequence rather than corresponding ribonucleotides, and the occurrence of the incorporation may also vary among the ribonucleotides and the 3xe2x80x2-deoxyribonucleotides depending on a base group each nucleotide has. Such biased incorporation between ribonucleotides and 3xe2x80x2-deoxyribonucleotides, as well as among ribonucleotides having different base groups and among deoxyribonucleotides having different base groups may result in short transcription products and fluctuation of signals from labeled ribonucleotides. Therefore, it is difficult to obtain accurate sequence data even though transcription products can be obtained.
Therefore, an object of the present invention is to provide a method for determining DNA sequences which utilizes an RNA polymerase exhibiting incorporation ability with no or little bias resulting from differences in nucleotides, and is capable of producing a transcription product of a long chain and affording more accurate sequence data where fluctuation of signals from labeled deoxyribonucleotides is reduced.
In the description of the present invention, amino acid residues are represented by the conventionally used one-letter codes. For clarification, they are specifically mentioned for only those amino acids appeared in this text as follows: phenylalanine (F), tyrosine (Y), proline (P), leucine (L), and histidine (H). A numeral accompanied by the codes is a number counted from N-terminus of polymerase. For example, xe2x80x9cF667xe2x80x9d means that the 667th amino acid residue of this polymerase is F, and xe2x80x9cF667Yxe2x80x9d means that Y was substituted for F of the 667th residue.
By the way, DNA polymerases are also known to show biased incorporation resulting from difference in a base group each nucleotide has, and mutant DNA polymerases free from such biased incorporation have also been known [Japanese Patent Unexamined Publication (KOKAI) No. (Hei) 8-205874/1996; and Proc. Natl. Acid. Sci. USA, 92:6339-6345, (1995)].
In the aforementioned literatures, it is described as follows. In the sequencing reaction utilizing T7 DNA polymerase, the 526th amino acid in the polymerase contributes to equalize nucleotide incorporation. And due to homology between T7 DNA polymerase and other DNA polymerases, the bias of incorporation of the other DNA polymerases may be reduced by replacing an amino acid residue present in their region homologous to the 526th amino acid including region in the T7 DNA polymerase. That is, Y (tyrosine) 526 of T7 DNA polymerase results in the reduced bias of efficiency for incorporation of 2xe2x80x2-dNTPs and 2xe2x80x2,3xe2x80x2-ddNTPs. F (phenylalanine) 762 of E. coli DNA polymerase I and F (phenylalanine) 667 of Thermus aquaticus DNA polymerase (generally called Taq DNA polymerase) are the amino acid residues corresponding to Y526 of T7 DNA polymerase and the bias of these polymerases may be reduced by substituting F762Y (tyrosine) and F667Y (tyrosine) respectively for these residues.
Further, it is also described that it was suggested that modification of a region of T7 RNA polymerase corresponding to the region discussed for DNA polymerases, i.e., the residues 631-640, may change its specificity for dNTPs.
However, RNA polymerases have not been used for sequencing methods so far, and therefore the different efficiency of ribonucleotide incorporation itself has not become a problem. Under such circumstances, any mutant RNA polymerases free from the biased incorporation have of course not been known. In fact, the aforementioned Japanese Patent Unexamined Publication (KOKAI) No. (Hei) 8-205874/1996 does not mention any specific examples of modification of T7 RNA polymerase.
The region of T7 RNA polymerase mentioned above is considered to correspond to the region consisting of 9-10 amino acid residues between amino acids K and YG in the motif B mentioned in Protein Engineering, 3:461-467, 1990, which region is particularly conserved in DNA polymerase xcex1 and I, and DNA-dependent RNA polymerases (T7 RNA polymerase is classified in these polymerases). F (phenylalanine) of the amino acid residue 762 in E. coli DNA polymerase and the amino acid residue 667 in Taq DNA polymerase, previously discussed for DNA polymerases, are observed inmany of DNA polymerases classified in the type I. However, it was surprisingly found that T7 RNA polymerase does not have F (phenylalanine) in the residues 631-640 corresponding to the aforementioned region, though T7 RNA polymerase is highly homologous to DNA polymerases. Therefore, the teachings of the aforementioned literatures could not be realized as described.
Further, the present inventors attempted modification of amino acids of T7 RNA polymerase in the region corresponding to the helix O of the finger subdomain of E. coli DNA polymerase I, in which F762 of E. coli DNA polymerase I presents. However, F (phenylalanine) was not found also in the helix Z in T7 RNA polymerase, which is indicated in the steric structure reported in the literature of Sousa et al. (Nature, 364:593-599, 1993) and corresponds to the helix O of E. coli DNA polymerase I.
Under the circumstances, the present inventors originally searched for a novel RNA polymerase in order to provide an RNA polymerase which exhibits little or no bias for the incorporating ability due to the kind of ribonucleotides and 3xe2x80x2-deoxyribonucleotides. As a result, the method for determining DNA sequences of the present invention was completed based on the finding that an RNA polymerase having an increased ability of incorporating 3xe2x80x2-deoxyribonucleotides and derivatives thereof can be obtained by partially modifying amino acids in a wild type RNA polymerase.
While it will be apparent from the descriptions hereinafter, the RNA polymerase of the present invention, or in particular the location of the amino acid modification thereof is not suggested nor taught at all in Japanese Patent Unexamined Publication (KOKAI) No. (Hei) 8-205874/1996, and it was absolutely originally found by the present inventors.
Further, as a result of the present inventors"" examination, it was found that the bias of the incorporation of ribonucleotides may be eliminated to some extent by using a mutant RNA polymerase, but the bias of the incorporation would still remain to a certain extent when a fluorescence-labeled 3xe2x80x2-deoxyribonucleotide is used as a terminator for nucleic acid extension reaction.
Therefore, a further object of the present invention is, from the viewpoint of obtaining a more practically useful method, to provide a method capable of eliminating the bias of the incorporation and more accurately determining nucleotide sequences even when a fluorescence-labeled 3xe2x80x2-deoxyribonucleotide is used.
The present invention relates to a method for determining DNA nucleotide sequences comprising reacting ribonucleoside 5xe2x80x2-triphosphates including ATP, GTP, CTP and UTP or derivatives thereof, and one or more kinds of 3xe2x80x2-deoxyribonucleoside 5xe2x80x2-triphosphates including 3xe2x80x2-dATP, 3xe2x80x2-dGTP, 3xe2x80x2-dCTP, 3xe2x80x2-dUTP and derivatives thereof (referred to as 3xe2x80x2-dNTP derivatives hereinafter) in the presence of an RNA polymerase and a DNA fragment containing a promoter sequence for the RNA polymerase to obtain a nucleic acid transcription product, separating the resulting transcription product, and determining a nucleic acid sequence from the resulting separated fraction, wherein the RNA polymerase is a mutant RNA polymerase consisting of a wild type RNA polymerase provided that at least one of amino acids in the wild type RNA polymerase was modified so as to enhance its ability for incorporating the 3xe2x80x2-dNTP derivatives in comparison with the corresponding wild type RNA polymerase.
The present invention further relates to the aforementioned method for determining DNA nucleotide sequences wherein the 3xe2x80x2-dNTP derivatives are 3xe2x80x2-deoxyribonucleotide derivatives represented by the following general formula [I]:
Qxe2x80x94Vxe2x80x94(CH2)nxe2x80x94NHxe2x80x94Rxe2x80x83xe2x80x83[I]
In the formula, Q represents a 3xe2x80x2-deoxyribonucleotide residue, n represents an integer not less than 1, preferably not less than 4, V represents xe2x80x94Cxe2x89xa1Cxe2x80x94 or xe2x80x94CHxe2x95x90CHxe2x80x94, and R represents a fluorescent group.
The present invention further relates to the a aforementioned method for determining DNA nucleotide sequences wherein the nucleic acid transcription reaction is performed in the presence of an inorganic pyrophosphatase.