1. Field of the Invention
The present invention is in the fields of molecular and cellular biology. The invention is related generally to compounds, compositions and methods useful in enhancing synthesis of nucleic acid molecules, especially from GC-rich nucleic acid templates. Specifically, the invention provides compositions comprising one or more compounds having a formula selected from the group consisting of formula I and formula II. Preferably used in accordance with the invention are 4-methylmorpholine N-oxide, betaine (carboxymethyltrimethylammonium), any amino acid (or derivative thereof), and/or an N-alkylimidazole such as 1-methylimidazole or 4-methylimidazole. In a preferred aspect, two or more, three or more, four or more, etc. of the compounds of the invention are combined to facilitate nucleic acid synthesis.
The invention also relates to compositions comprising one or more compounds of the invention and one or more additional components selected from the group consisting of (i) one or more nucleic acid molecules (including nucleic acid templates), (ii) one or more nucleotides, (iii) one or more polymerases or reverse transcriptases, and (iv) one or more buffering salts.
These compounds and compositions of the invention may be used in methods for enhanced, high-fidelity synthesis of nucleic acid molecules, including via amplification (particularly PCR), reverse transcription, and sequencing methods. The invention also relates to nucleic acid molecules produced by these methods, to fragments or derivatives thereof, and to vectors and host cells comprising such nucleic acid molecules, fragments, or derivatives. The invention also relates to the use of such nucleic acid molecules to produce desired polypeptides. The invention also concerns kits comprising the compositions or compounds of the invention.
2. Related Art
Genomic DNA
In examining the structure and physiology of an organism, tissue or cell, it is often desirable to determine its genetic content. The genetic framework (i.e., the genome) of an organism is encoded in the double-stranded sequence of nucleotide bases in the deoxyribonucleic acid (DNA) which is contained in the somatic and germ cells of the organism. The genetic content of a particular segment of DNA, or gene, is only manifested upon production of the protein which the gene ultimately encodes. In order to produce a protein, a complementary copy of one strand of the DNA double helix (the “sense” strand) is produced by polymerase enzymes, resulting in a specific sequence of messenger ribonucleic acid (mRNA). This mRNA is then translated by the protein synthesis machinery of the cell, resulting in the production of the particular protein encoded by the gene. There are additional sequences in the genome that do not encode a protein (i.e., “noncoding” regions) which may serve a structural, regulatory, or unknown function. Thus, the genome of an organism or cell is the complete collection of protein-encoding genes together with intervening noncoding DNA sequences. Importantly, each somatic cell of a multicellular organism contains the full complement of genomic DNA of the organism, except in cases of focal infections or cancers, where one or more xenogeneic DNA sequences may be inserted into the genomic DNA of specific cells and not into other, non-infected, cells in the organism. As noted below, however, the expression of the genes making up the genomic DNA may vary between individual cells.
cDNA and cDNA Libraries
Within a given cell, tissue or organism, there exist myriad mRNA species, each encoding a separate and specific protein. This fact provides a powerful tool to investigators interested in studying genetic expression in a tissue or cell—mRNA molecules may be isolated and further manipulated by various molecular biological techniques, thereby allowing the elucidation of the full functional genetic content of a cell, tissue or organism.
One common approach to the study of gene expression is the production of complementary DNA (cDNA) clones. In this technique, the mRNA molecules from an organism are isolated from an extract of the cells or tissues of the organism. This isolation often employs solid chromatography matrices, such as cellulose or hydroxyapatite, to which oligomers of deoxythymidine (dT) have been complexed. Since the 3′ termini on all eukaryotic mRNA molecules contain a string of deoxyadenosine (dA) bases, and since dA binds to dT, the mRNA molecules can be rapidly purified from other molecules and substances in the tissue or cell extract. From these purified mRNA molecules, cDNA copies may be made using the enzyme reverse transcriptase, which results in the production of single-stranded cDNA molecules. The single-stranded cDNAs may then be converted into a complete double-stranded DNA copy of the original mRNA (and thus of the original double-stranded DNA sequence, encoding this mRNA, contained in the genome of the organism) by the action of a DNA polymerase. The protein-specific double-stranded cDNAs can then be inserted into a plasmid, which is then introduced into a host bacterial cell. The bacterial cells are then grown in culture media, resulting in a population of bacterial cells containing (or in many cases, expressing) the gene of interest.
This entire process, from isolation of mRNA to insertion of the cDNA into a plasmid to growth of bacterial populations containing the isolated gene, is termed “cDNA cloning.” If cDNAs are prepared from a number of different mRNAs, the resulting set of cDNAs is called a “cDNA library,” representing the different functional (i.e., expressed) genes present in the source cell, tissue or organism. Genotypic analysis of these cDNA libraries can yield much information on the structure and function of the organisms from which they were derived.
DNA Amplification
In order to increase the copy number of, or “amplify,” specific sequences of DNA in a sample, investigators have relied on a number of amplification techniques. A commonly used amplification technique is the Polymerase Chain Reaction (“PCR”) method described by Mullis and colleagues (U.S. Pat. Nos. 4,683,195; 4,683,202; and 4,800,159). This method uses “primer” sequences which are complementary to opposing regions on the DNA sequence to be amplified. These primers are added to the DNA target sample, along with a molar excess of nucleotide bases and a DNA polymerase (e.g., Taq polymerase), and the primers bind to their target via base-specific binding interactions (i.e., adenine binds to thymine, cytosine to guanine). By repeatedly passing the reaction mixture through cycles of increasing and decreasing temperatures (to allow dissociation of the two DNA strands on the target sequence, synthesis of complementary copies of each strand by the polymerase, and re-annealing of the new complementary strands), the copy number of a particular sequence of DNA may be rapidly increased.
Other techniques for amplification of target nucleic acid sequences have also been developed. For example, Walker et al. (U.S. Pat. No. 5,455,166; EP 0 684 315) described a method called Strand Displacement Amplification (SDA), which differs from PCR in that it operates at a single temperature and uses a polymerase/endonuclease combination of enzymes to generate single-stranded fragments of the target DNA sequence, which then serve as templates for the production of complementary DNA (cDNA) strands. An alternative amplification procedure, termed Nucleic Acid Sequence-Based Amplification (NASBA) was disclosed by Davey et al. (U.S. Pat. No. 5,409,818; EP 0 329 822). Similar to SDA, NASBA employs an isothermal reaction, but is based on the use of RNA primers for amplification rather than DNA primers as in PCR or SDA. Another known amplification procedure includes Promoter Ligation Activated Transcriptase (LAT) described by Berninger et al. (U.S. Pat. No. 5,194,370).
PCR-based DNA Fingerprinting
Despite the availability of a variety of amplification techniques, most DNA fingerprinting methods rely on PCR for amplification, taking advantage of the well-characterized protocols and automation available for this technique. Examples of these PCR-based fingerprinting techniques include Random Amplified Polymorphic DNA (RAPD) analysis (Williams, J. G. K. et al., Nucl. Acids Res. 18(22):6531-6535 (1990)), Arbitrarily Primed PCR (AP-PCR; Welsh, J., and McClelland, M., Nucl. Acids Res. 18(24):7213-7218 (1990)), DNA Amplification Fingerprinting (DAF; Caetano-Anollés et al., Bio/Technology 9:553-557 (1991)), and microsatellite PCR or Directed Amplification of Minisatellite-region DNA (DAMD; Heath, D. D. et al., Nucl. Acids Res. 21(24):5782-5785 (1993)). All of these methods are based on the amplification of random DNA fragments by PCR, using arbitrarily chosen primers.
DNA Sequencing
In general, two techniques have been traditionally used to sequence nucleic acids. In the first method, termed “Maxam and Gilbert sequencing” after its co-developers (Maxam, A. M. and Gilbert, W., Proc. Natl. Acad. Sci. USA 74:560-564, 1977), DNA is radiolabeled, divided into four samples and treated with chemicals that selectively destroy specific nucleotides bases in the DNA and cleave the molecule at the sites of damage. By separating the resultant fragments into discrete bands by gel electrophoresis and exposing the gel to X-ray film, the sequence of the original DNA molecule can be read from the film. This technique has been used to determine the sequences of certain complex DNA molecules, including the primate virus SV40 (Fiers, W., et al., Nature 273:113-120, 1978; Reddy, V. B., et al., Science 200:494-502, 1978) and the bacterial plasmid pBR322 (Sutcliffe, G., Cold Spring Harbor Symp. Quant. Biol. 43:444-448, 1975).
An alternative technique for sequencing, named “Sanger sequencing” after its developer (Sanger, F., and Coulson, A. R., J. Mol. Biol. 94:444,448, 1975), is more commonly employed. This method uses the DNA-synthesizing activity of DNA polymerases which, when combined with mixtures of reaction-terminating dideoxynucleoside triphosphates (Sanger, F., et al., Proc. Natl. Acad. Sci. USA 74:5463-5467, 1977) and a short primer (either of which may be detectably labeled), gives rise to a series of newly synthesized DNA fragments specifically terminated at one of the four dideoxy bases. These fragments are then resolved by gel electrophoresis and the sequence determined as described for Maxam and Gilbert sequencing above. By carrying out four separate reactions (once with each ddNTP), the sequences of even fairly complex DNA molecules may rapidly be determined (Sanger, F., et al., Nature 265:678-695, 1977; Barnes, W., Meth. Enzymol. 152:538-556, 1987). While Sanger sequencing usually employs E. coli or T7 DNA polymerase (U.S. Pat. No. 4,795,699), recent modifications of this technique using T7 polymerase mutants allow sequencing to be accomplished using a single sequencing reaction containing all four chain-terminating ddNTPs at different concentrations (U.S. Pat. Nos. 4,962,020 and 5,173,411). Further modifications to the technique, to reduce or eliminate the buildup of reaction-poisoning pyrophosphate in the reaction mixtures, have also been described (U.S. Pat. No. 5,498,523). Other variations for sequencing nucleic acid molecules have also been described (see Murray, Nucl. Acids. Res. 17:8889, 1989; and Craxton, Methods: A Comparison to Methods in Enzymology, 3:20-25, 1991).
Limitations
As noted above, the faithful and high-fidelity copying of a template nucleic acid molecule is an essential step in the synthesis of a nucleic acid molecule in amplification, reverse transcription, and sequencing protocols. However, the use of standard compositions and protocols to accomplish this synthesis is often inefficient, in that they tend to terminate nucleic acid synthesis prematurely at certain secondary structural (Gerard, G. F., et al., FOCUS 11(4):60 (1989); Myers, T. W., and Gelfand, D. H., Biochemistry 30:7661 (1991)) and sequence (Messer, L. I., et al., Virol. 146:146 (1985)); Abbotts, J., et al., J. Biol. Chem. 268:10312-10323 (1993)) barriers in nucleic acid templates. This is particularly true for template sequences that have high guanine/cytosine content (i.e., “GC-rich” templates) and those that are fairly large in size (i.e., templates that are larger than about 3-5 kb in length). These secondary structural and sequence barriers in the template nucleic acid molecules occur frequently at homopolymer stretches (Messer, L. I., et al., Virol. 146:146 (1985); Huber, H. E., et al., J. Biol. Chem. 264:4669-4678 (1989); Myers, T. W., and Gelfand, D. H., Biochemistry 30:7661 (1991)) and are more often sequence rather than secondary structural barriers (Abbotts, J., et al., J. Biol. Chem. 268:10312-10323 (1993)). If these barriers could be overcome, yield of total and full-length nucleic acid products in synthesis reactions could be increased.
Some reports have indicated that modulation of the ionic strength or osmolality of the reaction mixtures, particularly of the concentration of Na+ and K+ ions, may influence the secondary structure and condensation of nucleic acids in vitro much as they do in vivo (Le Rudulier, D., et al., Science 224:1064 (1984); Buche, A., et al., J. Biomolec. Struct. Dyn. 8(3):601 (1990); Marquet, R., and Houssier, C., J. Biomolec. Struct. Dyn. 9(1):159 (1991); Buche, A., et al., J. Biomolec. Struct. Dyn. 11(1):95 (1993); Woodford, K., et al., Nucl. Acids Res. 23(3):539 (1995); Flock, S., et al., Biophys. J. 70:1456 (1996); Flock, S., et al., Biophys. J. 71:1519 (1996); EP 0 821 059 A2). In some of these studies, in vitro nucleic acid conformation and stability was found to be improved in buffer solutions containing any of a number of natural and synthetic osmoprotectant compounds, including polysaccharides such as trehalose (Carninci, P., et al., Proc. Natl. Acad. Sci. USA 95:520-524 (1998)), certain co-solvents such as glycerol and dimethylsulfoxide (Varadaraj, K., and Skinner, D. M., Gene 140:1 (1994)); glycine and derivatives thereof (Buche, A., et al., FEBS Lett. 247(2):367 (1989); Flock, S., et al., J. Biomolec. Struct. Dyn. 13(1):87 (1995); Houssier, C., et al., Comp. Biochem. Physiol. 117A(3):313 (1997)); low molecular weight amines such as beta-alanine, asparagine and cystamine (Kondakova, N. V., et al., Mol. Biol. (Moscow) 9(5):742 (1975); Aslanian, V. M., et al., Biofizika 29(4):564 (1984)); and other nitrogen-containing compounds and amino acids such as proline, betaine and ectoine (Rees, W. A., et al., Biochemistry 32:137-144 (1993); WO 95/20682; DE 44 11 588 C1; DE 44 11 594 C1; Mytelka, D. S., et al., Nucl. Acids Res. 24(14):2774 (1996); Baskaran, N., et al., Genome Res. 6:633 (1996); Weissensteiner, T., and Lanchbury, J. S., BioTechniques 21(6):1102 (1996); Rajendrakumar, C. S. V., et al, FEBS Letts. 410:201-205 (1997); Henke, W., et al., Nucl. Acids Res. 25(19):3957 (1997); Hengen, P. N., TIBS 22:225 (1997)). Betaine and ectoine are natural osmoprotectants in a variety of bacterial and animal cells (Chambers, S. T., et al., J. Bacteriol. 169(10):4845 (1987); Randall, K., et al., Biochim. Biophys. Acta 1291(3):189 (1996); Randall, K., et al., Biochem. Cell Biol. 74(2):283 (1996); Malin, G., and Lapidot, A., J. Bacteriol. 178(2):385 (1996); Gouesbet, G., et al., J. Bacteriol. 178(2):447 (1996); Cánovas, D., et al., J. Bacteriol. 178(24):7221 (1996); Cánovas, D., et al., J. Biol. Chem. 272(41):25794-25801 (1997).
There remains a need in the art, however, for compounds, compositions and methods that are useful in enhancing synthesis of nucleic acid molecules, particularly those that are GC-rich and/or those that are relatively large.