Methods of Characterizing Nucleic Acid Molecules
There are many reasons for characterizing nucleic acid molecules. For example, genes are rapidly being identified and characterized which are causative or related to many human, animal and plant diseases. Even within any particular gene, numerous mutations are being identified that are responsible for particular pathological conditions. Thus, although many methods for detection of both known and unknown mutations have been developed (e.g., see Cotton, 1993), our growing knowledge of human and other genomes makes it increasingly important to develop new, better, and faster methods for characterizing nucleic acids. Besides diagnostic uses, improved methods for rapidly characterizing nucleic acids will also be useful in many other areas, including human forensics, paternity testing, animal and plant breeding, tissue typing, screening for smuggling of endangered species, and biological research.
A variety of methods for characterizing DNA molecules are known in the art. For example, one can characterize DNA molecules by size based on their electrophoretic migration through an agarose or polyacrylamide gel. In these methods, the negatively charged DNA molecules move through a gel in the direction of the positively charged electrode. Provided that the percentage of agarose or polyacrylamide in the gel is appropriate for the size range of the DNA molecules being electrophoresed, smaller DNA molecules move through the pores of the gel more readily than larger DNA molecules. Because DNA molecules move in the gel at size-dependent rates, molecule sizes can be determined by staining and visualizing the DNA and then comparing the migration of the sample DNA molecules in the gel with the migration of marker DNA molecules of known size. Under the appropriate conditions, single-stranded DNA molecules differing in length by even a single nucleotide can be distinguished by denaturing polyacrylamide gel electrophoresis.
Another way to characterize DNA molecules is to treat each DNA molecule with one or more restriction endonucleases and then to determine the sizes of the various DNA fragments resulting from this treatment by agarose gel electrophoresis. Restriction endonucleases are enzymes that recognize specific sequences of bases in DNA (often 4, 5, 6 or sometimes 8 bases on each DNA strand) and then cut the phosphodiester bonds of the polynucleotide chains of DNA within to the recognition sequence. Because many restriction endonucleases with different recognition sequences are available, one can obtain a restriction map of an entire DNA molecule showing locations of restriction enzyme recognition sites and distances between them by determining which other restriction enzymes will cut each DNA fragment generated by any given restriction enzyme and what are the sizes of all of the resulting fragments. Such a restriction map is characteristic for a particular DNA molecule and can be used to obtain a rough identification of a particular sequence. Additionally, changes such as those caused by mutations in DNA may result in a loss or gain of a restriction site--a so-called "restriction fragment length polymorphism" (RFLP) (Kazazian, et al., 1989). An example of a diagnostically significant RFLP is a single base mutation in the beta-hemoglobin gene, the change from A to T which eliminates a Dde I restriction site, which results in sickle cell anemia (Kazazian, et al., 1989).
One of the most informative ways to characterize a DNA molecule is to determine its nucleotide sequence. One method for sequencing DNA (Maxam and Gilbert, 1977) is accomplished by treating each of four aliquots of one strand of a 5'- or 3'-end-labelled DNA molecule to be sequenced with one of four different chemical reagents. One chemical specifically modifies only the guanine base in the DNA, another modifies only cytosine, another modifies either guanine or adenine bases, and the last chemical modifies either thymine or cytosine bases. The chemical treatments are carried out under conditions so that only a small proportion of the total susceptible bases will actually be modified.
It is important that the chemical reactions are limited in order to generate a nested set of fragments differing by one base of a specific type. If all G residues, for example, were modified, the residues would all be susceptible to phosphodiester bond cleavage. Therefore, a collection of partially modified nucleic acids is required for sequencing. Subsequent treatment with piperidine results in cleavage of the phosphodiester bonds of the DNA molecule at the abasic sites, generating a mixture of all sizes of DNA molecules that are possible following chemical modification and loss of each one of the corresponding susceptible bases. The DNA molecules in each of the four reactions are then resolved by electrophoresis in adjacent lanes of a polyacrylamide gel and the pattern of bands is revealed by exposing the gel to X-ray film if the DNA molecules are labelled with a radioisotope. The sequence of the DNA is revealed by analyzing the exposed X-ray film. Alternatively, if the DNA molecules are labelled with a fluorescent, chemiluminescent or some other non-radioactive moiety, the sequence is revealed by an appropriate method known in the art.
The most commonly used method for sequencing DNA at this time (Sanger, et al., 1977) uses a DNA polymerase to produce differently sized fragments depending on the positions (sequence) of the four canonical bases (A=Adenine; C=Cytidine; G=Guanine; and T=Thymine) within the DNA to be sequenced. In this method, the DNA to be sequenced is used as a template for in vitro DNA synthesis. In addition to all four of the canonical deoxynucleotides (dATP, dCTP, dGTP and dTTP), a 2',3'-dideoxynucleotide is also included in each in vitro DNA synthesis reaction at a concentration that will result in random substitution of a small percentage of a canonical nucleotide by the corresponding dideoxynucleotide. Thus, each DNA synthesis reaction yields a mixture of DNA fragments of different lengths corresponding to chain termination wherever the dideoxynucleotide was incorporated in place of the normal deoxynucleotide. The DNA fragments are labelled, either radioactively or non-radioactively, by one of several methods and the label(s) may be incorporated into the DNA by extension of a labelled primer, or by incorporation of a labelled deoxy or dideoxy nucleotide. By carrying out DNA synthesis reactions for each of the four dideoxynucleotides (ddATP, ddCTP, ddGTP or ddTTP), then separating the products of each reaction in adjacent lanes of a denaturing polyacrylamide gel, and detecting those products by one of several methods, the sequence of the DNA template can be read directly.
Cycle Sequencing is a variation of Sanger sequencing that achieves a linear amplification of the sequencing signal by using a thermostable DNA polymerase and repeating chain terminating DNA synthesis during each of multiple rounds of denaturation of a template DNA (e.g., at 95.degree. C.), annealing of a single primer oligonucleotide (e.g., at 55.degree. C.), and extension of the primer (e.g., at 70.degree. C.).
Nucleic acid sequencing provides the highest degree of certainty as to the identity of a particular nucleic acid. Also, nucleic acid sequencing permits one to detect mutations in a gene even if the site of the mutation is unknown. Sequencing data may even provide enough information to permit an estimation of the clinical significance of a particular mutation or of a variation in the sequence.
In order to characterize a nucleic acid by sequencing, the nucleic acid must be isolated in sufficient quantity to be used for the particular method. Although it may be possible to obtain sufficient quantities of a nucleic acid for sequencing by first cloning it into a plasmid or other vector, this procedure is time-consuming and is often not practical for routine analysis of samples for clinical diagnostics or other purposes. When the amount of nucleic acid in a sample is less than optimal for a given method, it may be advantageous to use one of several methods which have been developed for amplifying parts of nucleic acid molecules. The polymerase chain reaction (PCR), nucleic acid sequence-based amplification (NASBA), self-sustained sequence replication (3SR), transcription-mediated amplification (TMA) and strand displacement amplification (SDA) are examples of some of the methods which have been developed for amplifying nucleic acid molecules in vitro.
By way of example, a specific portion of a DNA molecule may be amplified using PCR by temperature cycling of a sample DNA in a buffer containing two primers (one primer complementary to each of the DNA strands and which, together, flank the DNA sequence of interest), a thermostable DNA polymerase, and all four canonical 2'-deoxynucleoside-5'-triphosphates (dATP, dCTP, dGTP and dTTP). The specific nucleic acid sequence is geometrically amplified during each of about 30 cycles of denaturation (e.g., at 95.degree. C.), annealing of the two primers (e.g., at 55.degree. C.), and extension of the primers by the DNA polymerase (e.g., at 70.degree. C.), so that up to about a billion copies of the nucleic acid sequence are obtained. RNA may be similarly amplified using one of several protocols for RT-PCR, such as, for example, by carrying out the reaction using a thermostable DNA polymerase which also has reverse transcriptase activity (Myers and Gelfand, 1991).
The polymerase chain reaction (PCR), as discussed above, is the subject of numerous publications, including Mullis, KB, et al., U.S. Pat. No. 4,683,202 and U.S. Pat. No. 4,683,195; Mullis, KB, EP 201,184; Ehrlich, H., EP 50,424, EP 84,796, EP 258,017, & EP 237,362; Ehrlich, H., U.S. Pat. No. 4,582,788; Saiki, R., et al., U.S. Pat. No. 4,683,202; Mullis, KB, et al. (1986) in Cold Spring Harbor Symp. Quant. Biol. 51:263; Saiki, R., et al. (1985) Science 230:1350; Saiki, et al. (1985) Science 231:487; and Loh, EY, et al. (1988) Nature 335:141.
By way of a second example, all or a specific portion of an RNA molecule may be amplified using NASBA (Fahy, et al., 1991) by isothermal incubation of a sample RNA in a buffer containing two primers (a first primer complementary to the RNA molecule and encoding a promoter sequence for an RNA polymerase and a second primer complementary to the 3'-end of the first cDNA strand resulting from reverse transcription of the RNA molecule), an RNA- and DNA-dependent DNA polymerase which also has RNase H activity (or a separate RNase H enzyme), all four canonical 2'-deoxynucleoside-5'-triphosphates (dATP, dCTP, dGTP and dTTP), an RNA polymerase that recognizes the promoter sequence of the first primer, and all four ribonucleoside-5'-triphosphates (rATP, rCTP, rGTP and rUTP).
A first cDNA strand is synthesized by extension of the first primer by reverse transcription. Then, the RNase H digests the RNA of the resulting DNA:RNA hybrid, and the second primer primes synthesis of the second cDNA strand. The RNA polymerase then transcribes the resultant double-stranded DNA (ds-DNA) molecule from the RNA polymerase promoter sequence, making many more copies of RNA, which in turn, are reverse-transcribed into cDNA and the process begins all over again. This series of reactions, from ds-DNA through RNA intermediates to more ds-DNA, continues in a self-sustained way until reaction components are exhausted or the enzymes are inactivated. DNA samples can also be amplified by other variations of NASBA or 3SR.
Strand Displacement Amplification (SDA) is another isothermal nucleic acid amplification technique (Walker, 1994). SDA is a method of nucleic acid amplification in which extension of primers, displacement of single stranded extension products, annealing of primers to the extension products (or the original target sequence) and subsequent extension of the primers occurs concurrently in the reaction mix. This is in contrast to the PCR, in which the steps of the reaction occur in discrete phases or cycles as a result of the temperature constraints of the reaction. SDA is based upon 1) the ability of a restriction endonuclease to nick the unmodified strand of a hemiphosphorothioate form of its double-stranded recognition site and 2) the ability of certain polymerases to initiate replication at the nick and displace the downstream non-template strand. After an initial incubation at increased temperature (about 95.degree. C.) to denature double-stranded target sequences for annealing of the primers, subsequent polymerization and displacement of newly synthesized strands takes place at a constant temperature (usually about 37.degree. C.). Production of each new copy of the target sequence consists of five steps: 1) binding of amplification primers to an original target sequence or a displaced single-stranded extension product previously polymerized, 2) extension of the primers by exonuclease deficient (exo.sup.-) klenow polymerase incorporating an .alpha.-thio deoxynucleoside triphosphate, 3) nicking of a hemiphosphorothioate double-stranded restriction site, 4) dissociation of the restriction enzyme from the nick site, and 5) extension from the 3'-end of the nick by exo.sup.- klenow with displacement of the downstream non-template strand. Nicking, polymerization and displacement occur concurrently and continuously at a constant temperature because extension from the nick regenerates another nickable restriction site. When primers which hybridize to both strands of a double-stranded target sequence are used, amplification is exponential, as the sense and antisense strands serve as templates for the opposite primer in subsequent rounds of amplification.
PCR, NASBA and the other methods of nucleic acid amplification can be very useful for obtaining greater quantities of a nucleic acid for additional characterization. However, in general, the amplified nucleic acid molecules must be purified away from primers, nucleotides, incomplete amplification products and other impurities prior to being used for sequencing. Otherwise, for example, the PCR primers may compete with labelled sequencing primers and the PCR nucleotides may compete with the sequencing nucleotide mixes that are used for Sanger dideoxy sequencing. Also, Sanger sequencing clearly can not be done at the same time as PCR or another amplification method, at least not efficiently, because the dideoxynucleotides used for sequencing will result in termination of the DNA amplification reactions as well as the sequencing reactions.
One group has developed a method that attempts to decrease the number of steps required for sequencing nucleic acids (Shaw and Porter, PCT WO 95/06752). According to this method, 5'-alpha-borano-deoxynucleoside triphosphates, which were found to be resistant to exonuclease III (exo III) digestion, were incorporated into DNA during in vitro DNA synthesis in lieu of one of the canonical nucleotides (dATP, dCTP, dGTP, dTTP) in one of four primer extension reactions. Treatment with exo III will digest the synthesized DNA up to the point of alpha-borano deoxynucleoside incorporation. After digestion with exo III and resolution of the labeled fragments on a polyacrylamide gel, the sequence of the nucleic acid can determined.
An advantage of the alpha-borano/exo III method is that it can be integrated into PCR amplification. However, there are also disadvantages. A key disadvantage is a lower degree of accuracy of the sequence data compared to Maxam-Gilbert or Sanger sequencing because the alpha-borano/exo III method gives both extra bands and missing bands on sequencing gels. Although the mechanism by which these sequencing artifacts are generated is still uncertain, the extra bands may be due to incomplete digestion of non-boronated regions of DNA by exo III, while missing bands may be due to preferential digestion by exo III through some sequences containing alpha-borano-nucleotides. Another serious disadvantage of the alpha-borano/exo III method is related to the substrate requirements of exo III. Because exo III digests only double-stranded DNA, beginning at the 3'-end of each strand, the sequence can only be determined for the 3'-half of each strand of a PCR product, so it is not possible to obtain the sequence near to the primer. Also, because exo III digestion yields only fragments that are between 50% and 100% of the length of the full-size PCR product, the size range of the DNA which can be sequenced by the method is somewhat limited. For example, if a PCR product of 1000 base pairs in length is sequenced according to the alpha-borano/exo III method, the fragments to be electrophoresed would be approximately 500-1000 nucleotides long. Fragments of such a length are more difficult to resolve in DNA sequencing gels.
Uracil N-Glycosylase
"Uracil N-glycosylase" or "uracil-DNA glycosylase" (UNG or UDG) is an enzyme that catalyzes the cleavage of the N-glycosidic bond between the base uracil and the sugar deoxyribose in DNA into which the non-canonical nucleotide 2'-deoxyuridine-5'-triphosphate (dUTP) has been incorporated in place of the canonical nucleotide dTTP (Lindahl, 1979). UDG does not catalyze cleavage of uracil from free dUTP, free deoxyuridine or RNA (Duncan, 1981).
U.S. Pat. No. 5,035,996 describes a process in which UDG is used for controlling contamination of nucleic acid amplification reactions (Hartley, U.S. Pat. No. 5,035,996).
The purpose of this invention was to pretreat reaction mixtures containing a new sample on which an amplification is to be carried out with UDG to assure that any uracil-containing DNA from a prior amplification of another sample had been destroyed and would not contaminate the second amplification reaction. Digesting the second amplification mixture with UDG prior to carrying out the second amplification reaction destroys the ability of any residual products of the first amplification from serving as a template for further amplifications.
Somewhat similarly, U.S. Pat. No. 5,418,149 discloses the use of glycosylases to reduce non-specific amplification of nucleic acids.
A method for introducing site-specific mutations into DNA has also been described that relies upon replacement of thymine with uracil in DNA and subsequent treatment with uracil-DNA glycosylase. See U.S. Pat. No. 4,873,192 and Kunkel, 1985. Also, uracil-containing phage were suggested as a part of a biological containment system that would transfer genetic information only to uracil-N glycosylase deficient cells and not to naturally occurring bacteria. See Warner, et al., 1979.
Another use for UDG in molecular biology has been described by Nisson, et al., 1991. In Nisson, et al. UDG is used to facilitate directional cloning of PCR products. Thus, primers to be used for PCR amplification are made to contain a specific 12-base 5' sequence that contains dUMP in place of dTMP. Then, after the PCR amplification (without dUTP in the reaction mixture), since the amplification products contain dUMP residues at each 5' terminus, treatment of these PCR products with UDG removes the uracil residues from the 5' termini. Subsequent treatment with heat results in cleavage of the phosphodiester bonds at the abasic sites where uracil has been lost and thereby, generates 12-base cohesive termini which can be easily cloned into vectors with complementary termini.
What is needed in the art is a method for characterizing nucleic acids that is as accurate and as specific as Maxam-Gilbert or Sanger sequencing in detecting and identifying nucleic acids and differences between nucleic acids, but that is easier, faster, more sensitive and/or requires less sample DNA. The new method should also be capable of being used for relatively impure nucleic acid samples, such as amplification products, without needing to purify the sample away from primers, nucleotides or other impurities. The method also should be capable of being integrated into an amplification method such as PCR in a way similar to the alpha-borano/exo III method, but without having the disadvantages of the latter method.