The present invention relates to a method for the direct, exponential amplification and sequencing of DNA molecules as well as the use of the method. The direct, exponential amplification and sequencing of DNA molecules is referred to as xe2x80x9cDEXASxe2x80x9d in the following.
DNA sequence determination as developed by Sanger et al. ((1977) Proc. Natl. Acad. Sci. USA 74, 5463-5467) is usually carried out with a T7 DNA polymerase (Tabor S. and Richardson, C. C. (1989) Proc. Natl. Acad. Sci. USA 86, 4076-4080). This method requires relatively large amounts of a purified, single-stranded DNA template. Recently cycle sequencing has been developed (Murray, V. (1989) Nucleic Acids Res. 17, 8889). This method does not require a single-stranded template and allows the sequence reaction to be initiated with relatively small amounts of template. However, the template DNA has to be purified to almost complete homogeneity and is usually prepared by means of cloning in plasmids (Bolivar, F. et al., (1977) Gene 2, 95-113) and subsequent plasmid purification (Birnboim, H. C. and Doly, J. (1979) Nucleic Acids Res. 7, 1513-1523) or by means of PCR amplification (Mullis, K. B. and Faloona, F. A. (1987) Methods Enzymol. 155, 335-350). Only one primer is used in both of the methods described above.
In one embodiment of the cycle sequencing which is referred to as xe2x80x9ccoupled amplification and sequencingxe2x80x9d or xe2x80x9cCASxe2x80x9d Ruano and Kidd ((1991) Proc. Natl. Acad. Sci. USA 88, 2815-2819; U.S. Pat. No. 5,427,911) have shown that one can use a two-step protocol to generate sequences from DNA templates. In the first step 15 PCR cycles are carried out with Taq DNA polymerase in the absence of dideoxynucleotides in order to prepare an adequate amount of sequencing template. In a second step in which dideoxynucleotides and a labelled primer are added, CAS produces the sequence as well as the additional amplification of the target sequence. Two primers are used in both steps of the method.
Many DNA polymerases, including the Taq DNA polymerase, that are used in coupled DNA sequencing reactions strongly discriminates against ddNTPs and preferably incorporates dNTPs if it is furnished with a mixture of ddNTPs as well as dNTPs. In addition it incorporates each ddNTP, i.e. ddATP, ddCTP, ddGTP, ddTTP, with a strongly varying efficiency. Hence the optimization of the CAS process requires careful titration of the dideoxynucleotides.
Furthermore since coupled amplification and sequencing depends on the amount of the initial DNA, the distance between the two primers and the concentrations and the ratios of the ddNTPs and dNTPs relative to one another and to each other, the optimization of coupled amplification and sequencing reactions (CAS) requires that the reaction conditions are individually optimized for a particular DNA fragment.
All the methods described above require an interruption between the first step of exponential amplification of the template DNA and the second step for the synthesis of truncated DNA molecules and also require the individual optimization of a given DNA fragment which can be tedious and time-consuming and can lead to errors especially when sequencing a large number of different DNA molecules or when processing large amounts of samples in a hospital or laboratory or when sequencing rare samples for forensic or archaeological studies.
For this reason it would be advantageous to have available a method for sequencing nucleic acids which simultaneously potentiates the exponential amplification of molecules of full length and of molecules of truncated length in the reaction which leads to a reduction of the required amount of starting nucleic acid molecules and does not require an interruption of the exponential amplification step and of the sequencing step so that the whole reaction can be carried out more rapidly and with fewer manipulations.
The object of the present invention is to provide an improved, rapid and reliable method for sequencing DNA molecules, preferably genomic DNA.
A further object of the present invention is to provide a direct method for nucleic acid sequencing which simultaneously increases the exponential amplification of molecules of full length as well as of molecules of truncated length in the reaction which leads to a reduction of the initial amount of nucleic acid molecules that are required for the cycling reaction.
A further object of the present invention is to provide an improved, rapid and reliable method for sequencing DNA molecules, preferably genomic DNA that can be carried out in a single step in a single container.
A further object of the present invention is to provide an application according to the invention for sequence determination in medical diagnostics, forensics and population genetics.
Further objects of the invention are obvious to a person skilled in the art from the description.
In contrast to the above-described xe2x80x9cCASxe2x80x9d method a DNA polymerase is used as the thermostable DNA polymerase which, compared to wild-type Taq DNA polymerase, has a reduced discrimination against the four ddNTPs in the buffer and under the conditions that are used for the thermocycling. More preferably a DNA polymerase is used which carries a xe2x80x9cTabor-Richardsonxe2x80x9d mutation or a functional derivative thereof which also has no 5xe2x80x2-3xe2x80x2exonuclease activity such as e.g. AmplitaqFS( (Taq DNA polymerase (-exo5xe2x80x2-3xe2x80x2)(F667Y), Tabor and Richardson (1995), loc. cit.), Taquenase( (Taq DNA polymerase (235 (-exo5xe2x80x2-3xe2x80x2) (F667Y), Tabor and Richardson (1995), loc. cit.) and Thermo Sequenase( (Taq DNA polymerase (-exo5xe2x80x2-3xe2x80x2) (F667Y), Tabor and Richardson (1995), loc. cit.) as well as mixtures thereof or other DNA polymerases and mixtures thereof which are thermostable can also be used in the method of the present invention. Surprisingly the use of a DNA polymerase which, in comparison to wild-type Taq DNA polymerase, has a reduced discrimination against the four ddNTPs, enables the simultaneous and exponential synthesis of truncated as well as of full fragments from the start of the cycling reaction. Hence the present invention concerns a method for the direct sequencing of a nucleic acid molecule from a complex mixture of nucleic acids, such as e.g. total genomic human DNA, containing a reaction buffer, deoxynucleotides or derivatives thereof and a dideoxynucleotide or another terminating nucleotide and a thermostable polymerase which has a reduced discrimination against ddNTPs in comparison to wild-type Taq DNA polymerase. Within the sense of the present invention direct sequencing means that the nucleic acid fragment to be sequenced is simultaneously amplified and sequenced in one step without interrupting the reaction and without prior amplification of the nucleic acid fragment to be sequenced by the known methods and in such a manner that an unequivocal sequence ladder is readable.
A further difference between DEXAS and the xe2x80x9cCASxe2x80x9d method described above is the principle that the initial and subsequent cycle sequencing reaction is carried out with two primers, a first primer, and a second primer which lies on the strand complementary to the first, which are preferably present in a non-equimolar ratio and serve to simultaneously produce adequate template molecules of full length as well as truncated molecules which contribute to the sequencing of the DNA molecule. Four reactions are prepared, one for the determination of each base, so that each reaction contains two primers preferably in a non-equimolar ratio to one another of which either one is labelled and the other is unlabelled or both are differently labelled. The said non-equimolar ratio between the first primer and the second primer enables the simultaneous and exponential synthesis of the truncated as well as of the full fragments from the start of the cycling reaction. Furthermore each reaction contains from the start the DNA template to be sequenced as well as a buffer solution, thermostable DNA polymerase, thermostable pyrophosphatase (optionally), the four deoxynucleotides or derivatives thereof and a dideoxynucleotide or a terminating nucleotide e.g. 3-aminonucleotide or 3xe2x80x2ester-derivatized nucleotides.
Thereafter cycles for denaturing and extension are carried out so that in each of these cycles two types of extension products are formed from each primer. Each primer functions such that it initiates extension products which are long enough to reach the other primer position. Simultaneously products are initiated by each primer which, due to the incorporation of a dideoxynucleotide, are terminated before the other primer position is reached. The former said products (products of full length) serve in the following cycles as a template for the production of further DNA strands of full length and are also used as templates for extensions that contribute to the sequence reaction, and the latter products (truncated products) accumulate during the cycles and contribute to the sequence ladder that is generated. Hence DEXAS results in the simultaneous exponential production of a sequencing template and a sequence ladder in a single tube without having to interrupt the thermocycling reaction.
Therefore the use of the present invention enables the DNA sequence of multicopy and single-copy regions of DNA to be determined in a single step.
Hence the present invention for the first time provides a method which enables the nucleic acid to be sequenced to be simultaneously amplified and sequenced from a complex mixture of nucleic acids, such as e.g. total genomic human DNA, without prior amplification by the known methods, in one step i.e. without interrupting the reaction and such that an unequivocal sequence ladder is readable wherein at least one thermostable DNA polymerase, a nucleic acid molecule, a first primer, a second primer, a reaction buffer, deoxynucleotides or derivatives thereof and at least one dideoxynucleotide or another terminating nucleotide is present in the initial reaction mixture.
Furthermore the aforementioned object and goals of the present invention are achieved by the provision of a method for sequencing DNA molecules in which truncated DNA molecules as well as DNA molecules of full length are simultaneously and exponentially synthesized between two positions on the said DNA molecule in a thermocycling reaction which initially contains a DNA molecule, a first primer, a second primer, a reaction buffer, a thermostable DNA polymerase, thermostable pyrophosphatase (optionally), deoxynucleotides or derivatives thereof, and a dideoxynucleotide or another terminating nucleotide thereof wherein the initial ratio of the said primers in the said thermocycling reaction is not equal to 1.
In a preferred embodiment of the method of the invention the ratio of the said primers to one another is about 2:1 to about 3:1, most preferably 2:1.
In a further preferred embodiment of the method of the invention the said primers have such a length that the signal-to-noise ratio between the specific truncated DNA molecules and the unspecific DNA molecules is large enough not to substantially prevent the reading of the sequence. The said primers preferably have a length of at least 25 nucleotides.
Primers can be synthesized by means of methods known in the state of the art. For example primers can be synthesized using known methods which do not significantly change the stability or function of the said primers during the nucleic acid sequencing method of the present invention.
Furthermore the PNA-DNA hybrid oligonucleotides (see Finn, P. J. et al., N.A.R. 24, 3357-3363 (1996), Koch, T. et al., Tetrahedron Letters, 36, 6933-6936 (1995), Stetsenko, D. A, et al., Tetrahedron Letters 37, 3571-3574 (1996), Bergmann, F. et al., Tetrahedron Letters 36, 6823-6826 (1995) and Will, D. W. et al., Tetrahedron 51, 12069-12082 (1995)) are also regarded as primers for the method according to the invention.
In a further preferred embodiment of the invention the said first primer is labelled. Moreover it is preferable that the said first primer and second primer are labelled differently. Any suitable agents or methods known in the state of the art can be used as single or differential labelling agents and methods, provided that they do not significantly change the stability or function of the said primer in the DNA sequencing method of the present invention. For example single and differential labels can be selected from the group which comprises those enzymes such as xcex2-galactosidase, alkaline phosphatase and peroxidase, enzyme substrates, coenzymes, dyes, chromophores, fluorescent, chemiluminescent and bioluminescent labels such as FITC, Cy5, Cy5.5, Cy7, Texas-red and IRD40 (Chen et al., (1993), J. Chromatog. A 652: 355-360 and Kambara et al. (1992), Electrophoresis 13: 542-546) ligands or haptens such as e.g. biotin and radioactive isotopes such as 3H, 35S, 32P, 125I and 14C.
The method according to the invention can also be carried out as a xe2x80x9chot startxe2x80x9d method. In this case it is ensured that the activity of the polymerase or polymerases only starts at an increased temperature in order to suppress a polymerization on unspecifically hybridized primers at lower temperatures. One possibility is that the thermocycling reaction additionally contains a polymerase-inhibiting agent. Polymerase antibodies are for example available commercially which only denature at higher temperatures and thus release enzyme activity of the polymerase. However, polymerases modified by genetic engineering that are present in an inactive form at lower temperatures would also be conceivable. Other polymerase-inhibiting agents are disclosed by EP 0 771 870 A1, the disclosure of which is incorporated by reference. Examples of the polymerase-inhibiting agents include acid anhydrides, such as dicarboxylic acid anhydrides (e.g. citraconic anhydride, cis-aconitic anhydride, phthalic anhydride, succinic anhydride, and maleic anhydride) and dianhydrides (e.g. pyromellitic dianhydride or naphthalenetetracarboxylic dianhydride).
DEXAS is relatively insensitive to various buffers and various deoxynucleotides and dideoxynucleotide concentrations and can be carried out using various thermostable DNA polymerases.
The number of thermocycles can be from about 18 to about 50 cycles depending on the amount of template DNA and its purity.
Buffer components which can be used can include Tris-HCl at a pH of about 9.0 to 9.5 and at a concentration of about 10 to 30 mM, ammonium sulfate at a concentration of about 10 to 20 mM preferably 15 mM, MgCl2 at a concentration of about 3.5 to 5.5 mM, optionally about 0.05 mM mercaptoethanol, about 0.28% Tween20 and/or about 0.02% Nonidet 40. Buffer components, however, are not limited to these.
Deoxynucleotides may be selected from dGTP, dATP, dTTP and dCTP, but are not limited to these. According to the invention, it is additionally also possible to use derivatives of deoxynucleotides which are defined as those deoxynucleotides which are able to be incorporated by a thermostable DNA polymerase into growing DNA molecules that are synthesized in the thermocycling reaction. Such derivatives can include thionucleotides, 7-deaza-2xe2x80x2-dGTP, 7-deaza-2xe2x80x2-dATP, as well as deoxyinosine triphosphate, that can also be used as a substitute deoxynucleotide for dATP, dGTP, dTTP or dCTP, but are not limited to these. The aforementioned deoxynucleotides and derivatives thereof are preferably used at a concentration between about 300 xcexcM and about 2 mM.
Dideoxynucleotides can be selected from ddGTP, ddATP, ddTTP and ddCTP. Dideoxynucleotides, however, are not limited to these. According to the invention, it is also additionally possible to use derivatives of dideoxynucleotides which are defined as those dideoxynucleotides that are able to be incorporated by a thermostable DNA polymerase into growing DNA molecules that are synthesized in a thermo-cycling reaction. In addition, it is also possible to use other terminating nucleotides, e.g. 3xe2x80x2-amino nucleotide or 3xe2x80x2-ester-derivatized nucleotides. Preferred concentrations of ddNTPs are between about 1 and 5 xcexcM.
In the method according to the invention the preferred ratio of dNTPs to ddNTPs (dNTPs:ddNTPs) is between about 100:1 and 1000:1 preferably between about 300:1 and 600:1. A ratio of dNTPs to ddNTPs between about 600:1 and 1000:1 is preferred for longer nucleic acid fragments, wherein longer nucleic acid fragments means more than 0.2 kB.
In a further preferred embodiment of the method of the invention the said method is carried out at a temperature at which the signal-to-noise ratio between the specific truncated DNA molecules and the unspecific DNA molecules is large enough not to substantially impede reading of the sequence. It is less important to optimize the annealing temperature. In the case of human single-copy DNA sequences the highest possible annealing temperature drastically reduces the background. In this case the annealing and synthesis steps of the thermocycling reaction are preferably carried out at a minimum temperature of about 62xc2x0 C., more preferably at about 66xc2x0 C. and most preferably at at least about 68xc2x0 C.
The template of the DNA molecule to be sequenced is preferably present as a total genomic DNA molecule which does not have to be cloned or purified, but this may be the case. In one embodiment of the invention the genomic DNA has a length of more than or equal to 2 kb. Other forms of DNA that can be used as templates include cloned or uncloned mitochondrial DNA, partially purified or unpurified DNA such as e.g. plasmid DNA of bacterial colonies. DEXAS functions well with about 250 ng template DNA for the determination of mitochondrial DNA sequences and about 1 xcexcg template DNA for determining single-copy DNA sequences such as e.g. total genomic DNA, but it also functions with smaller amounts of mitochondrial or genomic DNA. The method according to the invention can also be used for the direct sequencing of unpurified single-stranded or double-stranded DNA from bacteriophages. DEXAS is in addition relatively independent of the base composition of the template.
The method according to the invention can especially be used for the direct sequencing of nucleic acid molecules in a Complex Mixture. Complex Mixtures are nucleic acid mixtures in which no enriching purification for the target nucleic acid molecule has been performed. However, the nucleic acid may have been isolated from its original source, e.g. cells. In Complex Mixtures, the ratio of the total number of nucleotides in the target nucleic acid molecule and in the background nucleic acid molecules is substantially smaller than one and the ratio of the number of the target nucleic acid molecule to the number of the background nucleic acid molecules is not greatly larger than one, or even smaller than one, and possibly even substantially smaller than one. For instance, the ratio of the number of the target nucleic acid molecule to the number of the background nucleic acid molecules can range from about 0.0001 to about 1. Such Complex Mixtures can be a whole human genomic DNA containing a single copy of a human gene (e.g. CCR-5 gene) as the target DNA molecule for direct sequencing by the method of the invention. Table 1c shows an example of the Complex Mixture.
The method according to the invention can also be used for the direct sequencing of nucleic acid molecules in a Medium Complex Mixture. Medium Complex Mixtures are nucleic acid mixtures in which no enriching purification for the target nucleic acid molecule has been preformed. However, the nucleic acid may or may not have been isolated from its original source, e.g. bacterial cells. In Medium Complex Mixtures, the ratio of the total numbers of nucleotides in the target nucleic acid molecule and in the background nucleic acid molecules is close to or smaller than one and the ratio of the number of the target nucleic acid molecule to the number of the background nucleic acid molecules is larger than one. For instance, the ratio of the number of the target nucleic acid molecule to the number of the background nucleic acid molecules can range from about 1 to about 1,000. Such Medium Complex Mixtures can be DNA from a bacterial colony (containing plasmid DNA as the target DNA molecule), DNA from phage plaques (containing M 13 DNA as the target DNA molecule), or partially purified or unpurified mitochondrial DNA. Table 1b shows an example of the Medium Complex Mixture.
The method according to the invention can also be used for the direct sequencing of template nucleic acid molecule in a Non-Complex Mixture. Non-Complex Mixtures are nucleic acid mixtures in which the template nucleic acid has been amplified and/or purified or partially purified. The amplification and purification methods can be cloning with subsequent plasmid purification, gradient centrifugation and purification, or the product of PCR in which the PCR product may or may not be purified (the number of PCR cycles in the absence of terminating nucleotides may range from 1 to 50). In Non-Complex Mixtures, the ratio of the number of the target nucleic acid molecule and the number of background nucleic acid molecules is much larger than one. For instance, the ratio of the number of the target nucleic acid molecule to the number of the background nucleic acid molecules can range from about 1,000 to about 1xc3x971018. Table 1a shows an example of the Non-Complex Mixture.
In a preferred embodiment the method according to the invention is furthermore characterized in that each thermocycling reaction to determine the position of A, G, C and T in the said DNA molecule is carried out in a single step, in a single container, vessel or tube.
The method according to the invention can be used for direct sequencing of RNA, such as human RNA. In this case, the polymerase exhibits, in addition to the reduced ability to discriminate against ddNTP""s, reverse transcriptase activity. One example of the polymerase which can be used for direct sequencing of RNA, such as human RNA, is a polymerase obtained from Thermus thermophilus Tth, which exhibits reverse transcriptase activity (Meyers, T. W., Gelfand, D. H. (1991) Biochemistry 30 (31): 7661-7666) and which additionally carries a Tabor-Richardson mutation (F667Y) or a functional derivative thereof.
In a preferred embodiment of the method of the invention, the nucleic acid molecule to be sequenced can be present in the form of RNA. To sequence a RNA with this embodiment of the method of the invention, at least two activities must be present in one polymerase enzyme: such an enzyme may be a DNA polymerase, for example, containing a Tabor-Richardson (F667Y) mutation, or a functional derivative thereof, which leads to a DNA polymerase enzyme, such as ThermoSequenase, that has a low rate of discrimination against ddNTPs. A second activity must be present in the polymerase enzyme enabling reverse transcription of RNA into DNA. Taq DNA polymerase (Jones et al., Nucl. Acids Res. 17: 8387-8388 (1989)) or Tth DNA polymerase (Myers et al., Biochemistry 30:7666-7672 (1991)) may be used. Tth polymerase reverse transcribes the RNA template into DNA which may then be utilized by the same enzyme as a template for the sequencing reaction. Since Tth is a homologue of Taq, the F667Y-mutation may be incorporated into the enzyme leading to a low discrimination against ddNTPs and thus all activities required for the above reaction, namely DNA polymerase activity, the ability to incorporate ddNTPs well and reverse transcriptase activity can be present in the same enzyme.
Suitable buffers include those reported in Myers et al (1991) Biochemistry 30:7666-7672. The following buffer can be used for the reaction ensuring the function of the activities in case the enzyme requires the presence of Mn ions: 10 mM Tris-HCl (pH 8.3), 40 mM KCl, 1 mM MnCl2. The reaction buffer may initially optionally contain MgCl2 at about 1 to 5 mM for reverse transcription. Reverse transcription is accomplished by performing an incubation step for 15 minutes at 70xc2x0 C. Subsequently the MgCl2 concentration is adjusted to between 1 mM and 5 mM and sequencing reaction is performed.
In a more preferred embodiment, the method for the direct exponential amplification and sequencing of a nucleic acid starting from RNA is performed in a single step, in a single container, vessel or tube for each thermocycling reaction to determine the position of each nucleotide in the RNA molecule.
In a preferred embodiment of the invention, a DNA polymerase from Carboxydothermus hydrogenoformans is used. This enzyme disclosed in EP 0 834 569 has reverse transcriptase activity in the presence of Mg ions without the presence of Mn ions. In a preferred embodiment of the invention the polymerase is mutated as described in Tabor and Richardson (Tabor, S. and Richardson, C. C. (1995) Proc. Natl. Acad. Sci. USA 92, 6339-6343; EP 0 655 506) in order to create an enzyme that does not discriminate against ddNTPs. In this set-up Mg is present from the beginning in a range between about 0.5 mM and 20 mM, no extra Mn is required. A suitable buffer additionally may comprise, but is not limited to, Tris-HCl (pH 6.5 to 11), KCl (2mM-100 mM), ammoniumsulphate (2mM-100 mM) and additional enzymes, such as thermostable pyrophosphatase (0,1-50 U). In addition, at least one nucleotide must be present. The reactions are cycled as disclosed above. The enzyme thus contains all activities to perform the reactions of reverse transcription, amplification and sequencing.
Suitable sources of nucleic acid molecules in the method according to the invention are body fluids such as sperm, urine, blood or fractions of these, hairs, an individual cell, cells or fractions thereof, hard tissue such as bones or soft tissue or fractions thereof and cell cultures or fractions thereof.
The present invention also serves for the application of the method according to the invention for the determination of a nucleotide sequence of a given nucleic acid molecule e.g. for sequencing Shotgun libraries with two labels for large-scale genome projects and in medical diagnostics, forensics and population genetics. The method of the present invention can be used to detect genetic mutations or polymorphisms, to identify the origin of the sequenced nucleic acid or to detect the presence of foreign or infectious agents in a sample.
The present invention relates to all combinations of all procedures of the above methods.
After preparation the sequencing reactions can be loaded directly onto a sequencing gel such as e.g. after addition of a commonly used application buffer (e.g. formamide which contains 20 mM EDTA (pH 7.4) and 6 mg/ml dextran blue) and denaturation (e.g. for 4 minutes at 96xc2x0 C.). The sequence ladder can be read according to known methods. The method of the invention is well suited for automation. Since the two primers in the reaction are provided with different labels which can for example be detected with two different wavelengths, the method of the present invention enables the simultaneous sequencing of both strands of a template and the detection of both reactions in one or several gel lanes. In general many DEXAS reactions that are carried out using different dyes can be carried out simultaneously in the same tube and applied to a sequencing instrument that is equipped with several lasers or be detected by other methods such as e.g. autoradiography.
A further subject matter of the present invention is a kit for the direct sequencing of a nucleic acid molecule from a complex mixture of nucleic acids, such as e.g. total genomic human DNA, containing a reaction buffer, deoxynucleotides or derivatives thereof and a dideoxynucleotide or a further terminating nucleotide and a thermostable polymerase which has a reduced discrimination against ddNTPs compared to wild-type Taq DNA polymerase. Within the sense of the present invention direct sequencing means that the nucleic acid fragment to be sequenced is simultaneously amplified and sequenced, without prior amplification of the nucleic acid fragment to be sequenced by the known methods, in a single step without interrupting the reaction and such that an unequivocal sequence ladder can be read.
A further subject matter of the present invention is a kit for the direct sequencing of a nucleic acid molecule of a complex mixture of nucleic acids containing a reaction buffer, deoxynucleotides or derivatives thereof and a dideoxynucleotide or another terminating nucleotide, a thermostable polymerase and two primers whose ratio is larger than 1. The kit particularly preferably contains a thermostable polymerase which has a reduced discrimination against ddNTPs in comparison to the wild-type Taq DNA polymerase.