Methods of Characterizing Nucleic Acid Molecules
A variety of methods for characterizing DNA molecules are known in the art. For example, one can characterize DNA molecules by size based on their electrophoretic migration through an agarose or polyacrylamide gel. Another way to characterize DNA molecules is to treat each DNA molecule with one or more restriction endonucleases and then to determine the sizes of the various DNA fragments resulting from this treatment by agarose gel electrophoresis. Additionally, changes such as those caused by mutations in DNA may result in a loss or gain of restriction site--a so-called "restriction fragment length polymorphism" (RFLP). An example of a diagnostically significant RFLP is a single base mutation in the beta-hemoglobin gene, the change from A to T, which eliminates a DdeI restriction site and results in sickle cell anemia.
One of the most informative ways to characterize a DNA molecule is to determine its nucleotide sequence. One method for sequencing DNA (Maxam and Gilbert, 1977) is accomplished by treating each of four aliquots of one strand of a 5'- or 3'-end labelled DNA molecule to be sequenced with one of four different chemical reagents. The most commonly used method for sequencing DNA at this time (Sanger, et al., 1977) uses a DNA polymerase to produce differently sized fragments depending on the positions (i.e., sequence) of the four canonical bases (A=Adenine; C=Cytidine; G=Guanine; T=Thymine) within the DNA to be sequenced. Cycle Sequencing is a variation of Sanger sequencing that achieves a linear amplification of the sequencing signal by using a thermostable DNA polymerase and repeating chain terminating DNA synthesis during each of multiple rounds of denaturation of a template DNA (e.g., at 95.degree. C.), annealing of a single primer oligonucleotide (e.g., at 55.degree. C.), and extension of the primer (e.g., at 70.degree. C.).
In order to characterize a nucleic acid by sequencing, the nucleic acid must be isolated in sufficient quantity to be used for the particular method. Although it may be possible to obtain sufficient quantities of a nucleic acid for sequencing by first cloning it into a plasmid or other vector, this procedure is time-consuming and is often not practical for routine analysis of samples for clinical diagnostics or other purposes. When the amount of nucleic acid in a sample is less than optimal for a given method, it may be advantageous to use one of several methods which have been developed for amplifying parts of nucleic acid molecules. The polymerase chain reaction (PCR) as described in U.S. Pat. Nos. 4,683,195 and 4,683,202, incorporated herein by reference, nucleic acid sequence-based amplification (NASBA) as described in U.S. Pat. No. 5,234,809 incorporated herein by reference, self-sustained sequence replication (3SR), transcription-mediated amplification (TMA) as described in U.S. Pat. No. 5,399,491 incorporated herein by reference, and strand displacement amplification (SDA) are examples of some of the methods which have been developed for amplifying nucleic acid molecules in vitro. RNA may also be amplified using one of several protocols for RT-PCR as described in U.S. Pat. No. 5,310,652 incorporated herein by reference, such as, for example, by carrying out the reaction using a thermostable DNA polymerase which also has reverse transcriptase activity (Myers and Gelfand, 1991).
Although PCR, NASBA and other of nucleic acid amplification methods are useful for obtaining greater quantities of a nucleic acid for additional characterization, such amplification methods pose difficulties when used in conjunction with sequencing. In general, the amplified nucleic acid molecules must be purified away from primers, nucleotides, incomplete amplification products and other impurities prior to being used for sequencing. Otherwise, for example, the PCR primers may compete with labelled sequencing primers and the PCR nucleotides may compete with the sequencing nucleotide mixes that are used for Sanger dideoxy sequencing. Also, Sanger sequencing normally cannot be done at the same time as PCR or another amplification method, at least not efficiently, because the dideoxynucleotides used for sequencing will result in termination of the DNA amplification reactions as well as the sequencing reactions.
One group has developed a method that attempts to decrease the number of steps required for sequencing nucleic acids (Shaw and Porter, PCT WO 95/06752). According to this method, 5'-alpha-borano-deoxynucleoside triphosphates, which were found to be resistant to exonuclease III (exo III) digestion, were incorporated into DNA during in vitro DNA synthesis in lieu of a portion of one of the canonical nucleotides (DATP, dCTP, dGTP, dTTP) in one of four primer extension reactions. Treatment with exo III will digest the synthesized DNA up to the point of alpha-borano deoxynucleoside incorporation. After digestion with exo III and resolution of the labelled fragments on a polyacrylamide gel, the sequence of the nucleic acid can be determined.
An advantage of the alpha-borano/exo III method is that it can be integrated into PCR amplification. However, there are also disadvantages. A key disadvantage is a lower degree of accuracy of the sequence data compared to Maxam-Gilbert or Sanger sequencing because the alpha-borano/exo III method gives both extra bands and missing bands on sequencing gels. Another serious disadvantage of the alpha-borano/exo III method is related to the substrate requirements of exo III. Because exo III digests only double-stranded DNA, beginning at the 3'-end of each strand, the sequence can only be determined for the 3'-half of each strand of a PCR product, so it is not possible to obtain the sequence near to the primer. Also, because exo III digestion yields only fragments that are between 50% and 100% of the length of the full-size PCR product, the size range of the DNA which can be sequenced by the method is somewhat limited. For example, if a PCR product of 1000 base pairs in length is sequenced according to the alpha-borano/exo III method, the fragments to be electrophoresed would be approximately 500-1000 nucleotides long. Fragments of such a length are more difficult to resolve in DNA sequencing gels.
The technique of single-strand conformation polymorphism (SSCP) is another technique that compares the electrophoresed fragment mobility of PCR products. SSCP and the closely related hetroduplex analysis methods have come into use for screening for single-base mutations (Orita, et al., 1989; Keen, et al., 1991). In these methods, the mobility of PCR-amplified mutated DNA is compared with the mobility of DNA amplified from normal or wild-type DNA by direct electrophoresis of samples in adjacent lanes of native polyacrylamide or other types of matrix gels. Single-base mutations often alter the secondary structure of the molecule sufficiently to caused slight mobility differences between the normal and mutant PCR products after prolonged electrophoresis.
Unfortunately, SSCP has several major drawbacks. The most important is that not all mutations result in detectable shifts in mobility. For instance, it has been shown that of 20 mutations detected by direct sequencing, only 35% were detected by SSCP (Sarkar, et al., 1992). Other studies have reported higher detection efficiencies, but it is well known in the art that SSCP has a major problem in missing point mutations. The chances of detecting mobility differences can be increased by running parallel gels under different conditions, for example at 4.degree. C. and 30.degree. C., with and without 5% glycerol, (Hayashi, 1991), but this significantly increases the cost and labor associated with analysis. Since mobility differences are generally quite small, analysis of genes in the heterozygous state is compromised. Another drawback of SSCP and related techniques is that they provide no information on the position of the mutation within the DNA fragment being analyzed. And finally, there seems to be an upper size limit for analysis by SSCP of approximately 300 bases and increased fragment length has been associated with decreased efficiency of mutation detection (Hayashi, 1991).
The technique of restriction endonuclease fingerprinting (REF) (Liu and Sommer, 1995) combines two mutation detection methods. The change of electrophoretic mobility under SSCP conditions of a DNA fragment containing a mutation is combined with the possible alteration of a restriction endonuclease site. The method requires a labeling step following the restriction endonuclease cleavage and suffers the same drawback of SSCP, namely the need for extensive optimization of gel conditions. Another modification of SSCP, dideoxy fingerprinting (ddF) (Sarkar et al., 1992), requires transcription of a PCR product into an RNA molecule, a DNA sequencing reaction with a single dideoxy terminator, and a non-denaturing, high resolution gel. The ddF method is limited in the size of DNA fragments analyzed and the localization of mutants is only within ten bases of the mutation locus (Blaszyk, et al., 1995).
U.S. Ser. No. 08/534,799 discloses a method of characterizing a nucleic acid in which non-canonical deoxynucleotide triphosphates are incorporated during the process of nucleic acid amplification.
FPG Protein
DNA repair is a fundamental biological process that ensures the stability and integrity of the genome. DNA repair enzymes are often classified according to the manner in which they promote removal of DNA damage. DNA glycosylases, known generally as base excision repair enzymes, are a specific class of DNA repair enzymes that catalyze the hydrolysis of the N-glycosidic bonds linking particular types of chemically modified bases or incorrectly inserted bases to the deoxyribose-phosphodiester backbone.
One of the best-known DNA N-glycosylases, uracil N-glycosylase (UNG), is used in vitro to prevent accidental carryover of PCR products into other reactions (see for example U.S. Pat. No. 5,035,996). In such a scheme, the PCR reaction contains dUTP that is incorporated in place of dTTP into the PCR product. The PCR product may then be degraded by the action of UNG and heat, which results in breakage of the DNA at the dU residues. Another method using UNG is the use of dU-containing PCR primers for cloning the product (Watson and Bennett, 1997). Following PCR with the dU-containing primers, the product is digested with UNG, which in effect cleaves off the 5'-ends, or a part of the ends of the PCR product. The vector contains complementary 5' overhanging termini which may be used to anneal and ligate the PCR product to the vector. DNA N-glycosylases have also been used to measure the amount of base damage to DNA in organisms subjected to different conditions. The cellular DNA is extracted, cleaved with a glycosylase, and the bases released by the glycosylase are analyzed or the presence of abasic sites in the DNA is determined.
UNG has also been used to "footprint" the binding sites of proteins on DNA molecules (Devchand, et al., 1993). dUTP is incorporated randomly into an end-labeled DNA, which is reacted with a protein or protein mixture. The specific binding of proteins is detected by the protection of a region of the DNA from UNG degradation.
UNG has also been used to detect mutations by a method involving incorporation of only dUTP in place of TTP (Vaughn and McCarthy, 1998; Vaughn and McCarthy, International application PCT/IE95/00067, 1997).
Formamidopyrimidine DNA N-glycosylase (FPG protein) is a base excision repair enzyme that recognizes chemically modified bases and catalyzes the cleavage of the N-glycosyl linkage between a modified base and the deoxyribose-phosphodiester backbone in DNA. In addition, FPG protein also possesses an apyrimidinic/apurinic (AP) lyase activity.
The N-glycosylase activity of FPG protein releases damaged bases from DNA, generating an AP site. The AP-lyase activity of the enzyme catalyzes .beta., .delta.-elimination reactions, leaving a single nucleotide gap in the DNA (Bailly, et al., 1989a). The resulting products arising from a damaged base or an AP site include a monomeric five-carbon fragment derived from deoxyribose (Bhagwat and Gerlt, 1996) and a one-base gapped DNA terminated by 3' and 5' phosphates (Bailly, et al., 1989b).
FPG protein recognizes diverse but structurally related DNA base modifications including, but not limited to, 8-hydroxyguanine (also known as 7-hydro-8-oxoguanine or 8-oxoguanine, referring to the favored 6,8-diketo tautomer at physiological pH) (Tchou, et al., 1991), imidazole ring-opened derivatives of adenine or guanine, designated 4,6-diamino-5-formamidopyrimidine and 2,6-diamino-4-hydroxy-5-formamidopyrimidine, respectively, (Chetsanga, et al., (1981) and Breimer (1984)), N.sub.7 -methylformamidopyrimidines, 5-hydroxyuracil and 5-hydroxycytosine (Hatahet, et al., 1994).
As used herein, "FPG protein" includes and is also known in the art as 8-hydroxyguanine DNA glycosylase, which recognizes 8-hydroxyguanine residues in DNA. Indeed, both FPG protein and 8-hydroxyguanine DNA glycosylase have been shown to be identical (Chung, et al., 1991).
Photoactive Dye Mutagenesis
Methylene blue is a thiazine dye which has been used for veterinary, pharmaceutical and other biological purposes (Vennerstrom, et al., 1995; Deutsch, et al., 1997). Exposure of DNA to methylene blue plus light caused guanine-specific modifications and that subsequent piperidine treatment of the modified DNA led to chain cleavage at guanine residues (Friedmann and Brown, 1978). Using the method of DNA sequencing described by Maxam and Gilbert (Maxam and Gilbert, 1977), including piperidine treatment to induce chain cleavage, Friedmann and Brown were able to determine the nucleotide sequence in DNA.
The photosensitizer methylene blue plus visible light produces DNA base damage that is recognized and subsequently removed by FPG protein. The DNA damage from methylene blue (MB) plus light has been reported to be caused by the production of singlet oxygen (Epe, et al., 1993), and the great majority of the damage is specific for guanine residues (Floyd, et al., 1989). The most prevalent type of damaged base, 8-hydroxyguanine, is mutagenic, which leads to guanine (G) to thymine (T) transversion mutations in DNA (Cheng, et al., 1992).
Rose bengal, an anionic xanthene dye, in the presence of ultraviolet radiation induces guanine-specific modifications, similar to those types guanine-specific modifications generated by methylene blue plus light (Friedman and Brown, 1978).
Needed in the art of molecular biology and diagnostics is a method for characterizing nucleic acids that is as accurate and as specific as DNA sequencing for detection or identification of nucleic acids, but that is simpler, faster, and requires less template DNA than dideoxy sequencing reactions. The method should also be useful for relatively impure nucleic acid samples, such as amplification products, eliminating the need to purify the sample from primers, nucleotides, other enzymes, or other impurities.