The detection and characterization of specific nucleic acid sequences and sequence changes have been utilized to detect the presence of viral or bacterial nucleic acid sequences indicative of an infection, the presence of variants or alleles of mammalian genes associated with disease and cancers, and the identification of the source of nucleic acids found in forensic samples, as well as in paternity determinations. As nucleic acid sequence data for genes from humans and pathogenic organisms accumulates, the demand for fast, cost-effective, and easy-to-use tests for as yet unknown, as well as known, mutations within specific sequences is rapidly increasing.
A handful of methods have been devised to scan nucleic acid segments for mutations. One option is to determine the entire gene sequence of each test sample (e.g., a clinical sample suspected of containing bacterial strain). For sequences under approximately 600 nucleotides, this may be accomplished using amplified material (e.g., PCR reaction products). This avoids the time and expense associated with cloning the segment of interest. However, specialized equipment and highly trained personnel are required for DNA sequencing, and the method is too labor-intense and expensive to be practical and effective in the clinical setting.
In view of the difficulties associated with sequencing, a given segment of nucleic acid may be characterized on several other levels. At the lowest resolution, the size of the molecule can be determined by electrophoresis by comparison to a known standard run on the same gel. A more detailed picture of the molecule may be achieved by cleavage with combinations of restriction enzymes prior to electrophoresis, to allow construction of an ordered map. The presence of specific sequences within the fragment can be detected by hybridization of a labeled probe, or, as noted above, the precise nucleotide sequence can be determined by partial chemical degradation or by primer extension in the presence of chain-terminating nucleotide analogs.
For detection of single-base differences between like sequences (e.g., the wild type and a mutant form of a gene), the requirements of the analysis are often at the highest level of resolution. For cases in which the position of the nucleotide in question is known in advance, several methods have been developed for examining single base changes without direct sequencing. For example, if a mutation of interest happens to fall within a restriction recognition sequence, a change in the pattern of digestion can be used as a diagnostic tool (e.g., restriction fragment length polymorphism [RFLP] analysis). In this way, single point mutations can be detected by the creation or destruction of RFLPs.
Single-base mutations have also been identified by cleavage of RNA—RNA or RNA-DNA heteroduplexes using RNaseA (Myers et al., Science 230:1242 [1985] and Winter et al., Proc. Natl. Acad. Sci. USA 82:7575 [1985]). Mutations are detected and localized by the presence and size of the RNA fragments generated by cleavage at the mismatches. Single nucleotide mismatches in DNA heteroduplexes are also recognized and cleaved by some chemicals, providing an alternative strategy to detect single base substitutions, generically named the “Mismatch Chemical Cleavage” (MCC) (Gogos et al., Nucl. Acids Res., 18:6807-6817 [1990]). However, this method requires the use of osmium tetroxide and piperidine, two highly noxious chemicals that are not suited for use in a clinical laboratory. Enzymes such as the bacteriophage T4 endonuclease VII have been used in Enzymatic Mismatch Cleavage (EMC) (Youil et al, Genomics 32:431 [1996]). However, all of the mismatch cleavage methods lack sensitivity to some mismatch pairs, and all are prone to background cleavage at sites removed from the mismatch. Furthermore, the generation of purified fragments to be used in heteroduplex formation is both labor intensive and time consuming.
RFLP analysis suffers from low sensitivity and requires a large amount of sample. When RFLP analysis is used for the detection of point mutations, it is, by its nature, limited to the detection of only those single base changes which fall within a restriction sequence of a known restriction endonuclease. Moreover, the majority of the available enzymes have 4 to 6 base-pair recognition sequences, and cleave too frequently for many large-scale DNA manipulations (Eckstein and Lilley (eds.), Nucleic Acids and Molecular Biology, vol. 2, Springer-Verlag, Heidelberg [1988]). Thus, it is applicable only in a small fraction of cases, as most mutations do not fall within such sites.
A handful of rare-cutting restriction enzymes with 8 base-pair specificities have been isolated and these are widely used in genetic mapping, but these enzymes are few in number, are limited to the recognition of G+C-rich sequences, and cleave at sites that tend to be highly clustered (Barlow and Lehrach, Trends Genet., 3:167 [1987]). Recently, endonucleases encoded by group I introns have been discovered that might have greater than 12 base-pair specificity (Perlman and Butow, Science 246:1106 [1989]), but again, these are few in number.
If the change is not in a restriction enzyme recognition sequence, then allele-specific oligonucleotides (ASOs) can be designed to hybridize in proximity to the unknown nucleotide, such that a primer extension or ligation event can be used as the indicator of a match or a mismatch. Hybridization with radioactively labeled ASOs also has been applied to the detection of specific point mutations (Conner, Proc. Natl. Acad. Sci., 80:278 [1983]). The method is based on the differences in the melting temperature of short DNA fragments differing by a single nucleotide (Wallace et al., Nucl. Acids Res., 6:3543 [1979]). Similarly, hybridization with large arrays of short oligonucleotides is now used as a method for DNA sequencing (Bains and Smith, J. Theor. Biol., 135:303 [1988]; Drmanac et al., Genomics 4:114 [1989]). To perform either method it is necessary to work under conditions in which the formation of mismatched duplexes is eliminated or reduced while perfect duplexes still remain stable. Such conditions are termed “high stringency” conditions. The stringency of hybridization conditions can be altered in a number of ways known in the art. In general, changes in conditions that enhance the formation of nucleic acid duplexes, such as increases in the concentration of salt, or reduction in the temperature of the solution, are considered to reduce the stringency of the hybridization conditions. Conversely, reduction of salt and elevation of temperature are considered to increase the stringency of the conditions. Because it is easy to change and control, variation of the temperature is commonly used to control the stringency of nucleic acid hybridization reactions.
Discrimination of hybridization based solely on the presence of a mismatch imposes a limit on probe length because effect of a single mismatch on the stability of a duplex is smaller for longer duplexes. For oligonucleotides designed to detect mutations in genomes of high complexity, such as human DNA, it has been shown that the optimal length for hybridization is between 16 and 22 nucleotides, and the temperature window within which the hybridization stringency will allow single base discrimination can be as large as 10° C. (Wallace [1979], supra). Usually, however, it is much narrower, and for some mismatches, such as G-T, it may be as small as 1 to 2° C. These windows may be even smaller if any other reaction conditions, such as temperature, pH, concentration of salt and the presence of destabilizing agents (e.g., urea, formamide, dimethylsulfoxide) alter the stringency. Thus, for successful detection of mutations using such high stringency hybridization methods, a tight control of all parameters affecting duplex stability is critical.
In addition to the degree of homology between the oligonucleotide probe and the target nucleic acid, efficiency of hybridization also depends on the secondary structure of the target molecule. Indeed, if the region of the target molecule that is complementary to the probe is involved in the formation of intramolecular structures with other regions of the target, this will reduce the binding efficiency of the probe. Interference with hybridization by such secondary structure is another reason why high stringency conditions are so important for sequence analysis by hybridization. High stringency conditions reduce the probability of secondary structure formation (Gamper et al., J. Mol. Biol., 197:349 [1987]). Another way to of reducing the probability of secondary structure formation is to decrease the length of target molecules, so that fewer intrastrand interactions can occur. This can be done by a number of methods, including enzymatic, chemical or thermal cleavage or degradation. Currently, it is standard practice to perform such a step in commonly used methods of sequence analysis by hybridization to fragment the target nucleic acid into short oligonucleotides (Fodor et al., Nature 364:555 [1993]).
ASOs have also been adapted to the PCR method. In this, or in any primer extension-based assay, the nucleotide to be investigated is positioned opposite the 3′ end of a primer oligonucleotide. If the bases are complementary, then a DNA polymerase can extend the primer with ease; if the bases are mismatched, the extension may be blocked. Blocking of PCR by this method has had some degree of success, but not all mismatches are able to block extension. In fact, a “T” residue on the 3′ end of a primer can be extended with reasonable efficiency when mis-paired with any of the non-complementary nucleotide when Taq DNA polymerase, a common PCR enzyme, is used (Kwok, et al, Nucl. Acids. Res. 18:999 [1990]). Further, if any of the enzymes having 3′-5′ exonuclease “proofreading” activity (e.g., Vent DNA polymerase, New England Biolabs, Beverly Mass.) are used, the mismatch is first removed, then filled in with a matched nucleotide before further extension. This dramatically limits the scope of application of PCR in this type of direct mutation identification.
Two other methods of mutation detection rely on detecting changes in electrophoretic mobility in response to minor sequence changes. One of these methods, termed “Denaturing Gradient Gel Electrophoresis” (DGGE) is based on the observation that slightly different sequences will display different patterns of local melting when electrophoretically resolved on a gradient gel. In this manner, variants can be distinguished, as differences in the melting properties of homoduplexes versus heteroduplexes differing in a single nucleotide can be used to detect the presence of mutations in the target sequences because of the corresponding changes in the electrophoretic mobilities of the hetero- and homoduplexes. The fragments to be analyzed, usually PCR products, are “clamped” at one end by a long stretch of G-C base pairs (30-80) to allow complete denaturation of the sequence of interest without complete dissociation of the strands. The attachment of a GC “clamp” to the DNA fragments increases the fraction of mutations that can be recognized by DGGE (Abrams et al., Genomics 7:463 [1990]). Attaching a GC clamp to one primer is critical to ensure that the amplified sequence has a low dissociation temperature (Sheffield et al., Proc. Natl. Acad. Sci., 86:232 [1989]; and Lerman and Silverstein, Meth. Enzymol., 155:482 [1987]). Modifications of the technique have been developed, using temperature gradient gels (Wartell et al., Nucl. Acids Res., 18:2699-2701 [1990]), and the method can be also applied to RNA:RNA duplexes (Smith et al., Genomics 3:217 [1988]).
Limitations on the utility of DGGE include the requirement that the denaturing conditions must be optimized for each specific nucleic acid sequence to be tested. Furthermore, the method requires specialized equipment to prepare the gels and maintain the high temperatures required during electrophoresis. The expense associated with the synthesis of the clamping tail on one oligonucleotide for each sequence to be tested is also a major consideration. In addition, long running times are required for DGGE. The long running time of DGGE was shortened in a modification of DGGE called constant denaturant gel electrophoresis (CDGE) (Borrensen et al., Proc. Natl. Acad. Sci. USA 88:8405 [1991]). CDGE requires that gels be performed under different denaturant conditions in order to reach high efficiency for the detection of unknown mutations. Both DGGE and CDGE are unsuitable for use in clinical laboratories.
A technique analogous to DGGE, termed temperature gradient gel electrophoresis (TGGE), uses a thermal gradient rather than a chemical denaturant gradient (Scholz et al., Hum. Mol. Genet., 2:2155 [1993]). TGGE requires the use of specialized equipment that can generate a temperature gradient perpendicularly oriented relative to the electrical field. TGGE can detect mutations in relatively small fragments of DNA therefore scanning of large gene segments requires the use of multiple PCR products prior to running the gel.
Another common method, called “Single-Strand Conformation Polymorphism” (SSCP) was developed by Hayashi, Sekya and colleagues (reviewed by Hayashi, PCR Meth. Appl., 1:34-38, [1991]) and is based on the observation that single strands of nucleic acid can take on characteristic conformations under non-denaturing conditions, and these conformations influence electrophoretic mobility. The complementary strands assume sufficiently different structures that the two strands may be resolved from one another. Changes in the sequence of a given fragment will also change the conformation, consequently altering the mobility and allowing this to be used as an assay for sequence variations (Orita, et al., Genomics 5:874 [1989]).
The SSCP process involves denaturing a DNA segment (e.g., a PCR product) that is usually labeled on both strands, followed by slow electrophoretic separation on a non-denaturing polyacrylamide gel, so that intra-molecular interactions can form and not be disturbed during the run. This technique is extremely sensitive to variations in gel composition and temperature. A serious limitation of this method is the relative difficulty encountered in comparing data generated in different laboratories, under apparently similar conditions.
The dideoxy fingerprinting (ddF) technique is another technique developed to scan genes for the presence of unknown mutations (Liu and Sommer, PCR Methods Applic, 4:97 [1994]). The ddF technique combines components of Sanger dideoxy sequencing with SSCP. A dideoxy sequencing reaction is performed using one dideoxy terminator and then the reaction products are electrophoresed on nondenaturing polyacrylamide gels to detect alterations in mobility of the termination segments as in SSCP analysis. While ddF is an improvement over SSCP in terms of increased sensitivity, ddF requires the use of expensive dideoxynucleotides and this technique is still limited to the analysis of fragments of the size suitable for SSCP (i.e., fragments of 200-300 bases for optimal detection of mutations).
In addition to the above limitations, all of these methods are limited as to the size of the nucleic acid fragment that can be analyzed. For the direct sequencing approach, sequences of greater than 600 base pairs require cloning, with the consequent delays and expense of either deletion sub-cloning or primer walking, in order to cover the entire fragment. SSCP and DGGE have even more severe size limitations. Because of reduced sensitivity to sequence changes, these methods are not considered suitable for larger fragments. Although SSCP is reportedly able to detect 90% of single-base substitutions within a 200 base-pair fragment, the detection drops to less than 50% for 400 base pair fragments. Similarly, the sensitivity of DGGE decreases as the length of the fragment reaches 500 base-pairs. The ddF technique, as a combination of direct sequencing and SSCP, is also limited by the relatively small size of the DNA that can be screened.
Another method of detecting sequence polymorphisms based on the conformation assumed by strands of nucleic acid is the CLEAVASE Fragment Length Polymorphism (CFLP) method (Brow et al., J. Clin. Microbiol., 34:3129 [1996]; PCT Publication WO 96/15267; U.S. Pat. Nos. 5,843,654; and co-pending application Ser. No. 08/520,946, herein incorporated by reference in their entireties). This method uses the actions of a structure specific nuclease to cleave the folded structures, thus creating a set of product fragments that can by resolved by size (e.g., by electrophoresis). This method is much less sensitive to size so that entire genes, rather than gene fragments, may be analyzed.
In many situations (e.g., in many clinical laboratories), electrophoretic separation and analysis may not be technically feasible, or may not be able to accommodate the processing of a large number of samples in a cost-effective manner. There is a clear need for a method of analyzing the characteristic conformations of nucleic acids without the need for either electrophoretic separation of conformations or fragments or for elaborate and expensive methods of visualizing gels (e.g., darkroom supplies, blotting equipment or fluorescence imagers).
In addition to the apparently fortuitous folded conformations that may be assumed by any nucleic acid segment, as noted above, the folded structures assumed by some nucleic acids are linked in a variety of ways to the function of that nucleic acid. For example, tRNA structure is critical to its proper function in protein assembly, ribosomal RNA (rRNA) structures are essential to the correct function of the ribosome, and correct folding is essential to the catalytic function of Group I self-splicing introns (See e.g., the chapters by Woese and Pace (p. 91), Noller (p. 137), and Cech (p. 239) in Gesteland and Atkins (eds.), The RNA World, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. [1993]). Folded structures in viral RNAs have been linked to infectivity (Proutski et al., J Gen Virol., 78(Pt 7):1543-1549 [1997], altered splicing (Ward, et al., Virus Genes 10:91 [1995]), translational frameshifting (Bidou et al., RNA 3:1153 [1997]), packaging (Miller, et al. J. Virol., 71:7648 [1997]), and other functions. In both prokaryotes and eukaryotes, RNA structures are linked to post-transcriptional control of gene expression through mechanisms including attenuation of translation (Girelli et al., Blood 90:2084 [1997], alternative splicing (Howe and Ares, Proc. Natl. Acad. Sci. USA 94:12467 [1997]) and signaling for RNA degradation (Veyrune et al, Oncogene 11:2127 [1995]). Messenger RNA secondary structure has also been associated with localization of that RNA within cells (Serano and Cohen, Develop., 121:3809-3818 [1995]). In DNA, it has been shown that cruciform structures have also been tied to control of gene expression (Hanke et al., J. Mol. Biol., 246:63 [1995]). It can be seen from these few examples that the use of folded structures as signals within organisms is not uncommon, nor is it limited to non-protein-encoding RNAs, such as rRNAs, or to non-protein-encoding regions of genomes or messenger RNAs.
Some mutations and polymorphisms associated with altered phenotype act by altering structures assumed by nucleic acids. Any of the functions and pathways cited above may be altered, e.g., decreased or increased in efficacy, by such a structural alteration. Such alterations in function may be associated with medically relevant effects, including but not limited to tumor growth or morphology (Thompson et al., Oncogene 14:1715 [1997]), drug resistance, or virulence (Mangada and Igarishi, Virus Genes 14:5 [1997], Ward et al., supra) in pathogens. For example, the iron availability in blood in controlled by the protein ferritin, an iron storage protein. Ferritin levels are controlled post-transcriptionally by binding of iron-regulatory proteins to a structure (an iron-responsive element, or IRE) on 5′ untranslated region of the ferritin mRNA, thereby blocking translation when iron levels are low. Hereditary hyperferritinemia, an iron storage disorder linked to cataract formation, had been found in some cases to be caused by mutations in the IRE that alter or delete the structure, preventing translational regulation.
It can easily be appreciated from these few examples that ability to rapidly analyze nucleic acid structure would be a useful tool for both basic and clinical research and for diagnostics. Further, accurate identification of nucleic acid structures would facilitate the design and application of therapeutic agents targeted directly at nucleic acids, such as antisense oligonucleotides, aptamers and peptide nucleic acid agents.
Targeting mRNA with sequence-specific deoxyoligonucleotides has recently gained attention for purposes of antisense research, oligomer hybridization for various gene expression assays such as the INVADER assay (Lyamichev et al., Nature Biotechnology 17:292 [1999]), and primer selection for reverse transcription and extension experiments. One of the major problems associated with such experiments is the ability to define regions of the RNA that can be efficiently targeted for oligonucleotide hybridization. To simply use randomly selected complementary oligonucleotides for a given RNA target without prior knowledge of regions of the RNA that allow efficient hybridization has been proven to be an ineffective approach. It is estimated that targeting RNA with antisense oligonucleotides based on random design results in one out of 18-20 tested oligonucleotides showing significant inhibition of gene expression (Sczakiel, Fronteirs in Biosciences 5:194 [2000]; Patzel et al., Nucleic Acids Res., 27:4328 [1999]; Peyman et al., Biol. Chem. Hoppe-Seyler 367:195 [1995]; Monia et al, Nature Med., 2:668 [1996]). Secondary and tertiary structures of RNA are thought of to be the major reasons that influence the ability of an oligonucleotide to bind targeted regions of the RNA (Vickers et al., Nucleic Acids Res., 28:1340 [2000]; Lima et al., Biochemistry 31:12055 [1992]; Uhlenbeck, J. Mol. Biol., 65:25 [1972]; Freier and Tinoco, Biochemistry 14:3310 [1975]). This is due to the hybridization kinetics and thermodynamics of destroying any structural motifs of the RNA and, in return, hybridizing the complementary DNA oligonucleotide (Patzel et al., Nucleic Acids Res., 27:4328 [1999]; Mathews et al., RNA 5:1458 [1999]). Thus, the ability to identify regions of RNA that are “accessible” for hybridization is of crucial importance for design and selection of effective oligonucleotides.
To date, there are few experimental and theoretical methods available for identifying accessible regions in RNA. These include the use of RNase-H footprinting (Ho et al., Nature Biotechnology 16:59 [1998]; Mateeva et al., Nucleic Acids Res., 25:5010 [1997]; Mateeva et al., Nature Biotechnology 16:1374 [1998]), complementary arrays of oligonucleotide libraries (Southern et al., Nucleic Acids Res., 22:1368 [1994]; Mir and Southern, Nature Biotechnology 17:788 [1999]), ribozyme libraries with random hexamer internal guide sequences (Campbell and Cech, RNA 1:598 [1995]; Lan et al., Science 280:1593 [1998]), and RNA and DNA structure prediction computer programs (Sczakiel, Frontiers in Biosciences 5:194 [2000]; Patzel et al., Nucleic Acids Res., 27:4328 [1999]; Zuker, Science 244:48 [1989]; Walton et al., Biotechnol. Bioeng., 65:1 [1999]). Thus, the art is in need of realiable and efficient methods for identifying and characterizing accessible regions of RNA.