1. Field of the Invention
The present invention is in the fields of molecular and cellular biology. The invention is related generally to compounds, compositions and methods useful in enhancing synthesis of nucleic acid molecules, especially from GC-rich nucleic acid templates. Specifically, the invention provides compositions comprising one or more compounds having a formula selected from the group consisting of formula I and formula II. Preferably used in accordance with the invention are 4-methylmorpholine N-oxide, betaine (carboxymethyltrimethyl ammonium), any amino acid (or derivative thereof, and/or an N-alkylimidazole such as 1-methylimidazole or 4-methylimidazole. In a preferred aspect, two or more, three or more, four or more, etc. of the compounds of the invention are combined to facilitate nucleic acid synthesis.
The invention also relates to compositions comprising one or more compounds of the invention and one or more additional components selected from the group consisting of (i) one or more nucleic acid molecules (including nucleic acid templates), (ii) one or more nucleotides, (iii) one or more polymerases or reverse transcriptases, and (iv) one or more buffering salts.
These compounds and compositions of the invention may be used in methods for enhanced, high-fidelity synthesis of nucleic acid molecules, including via amplification (particularly PCR), reverse transcription, and sequencing methods. The invention also relates to nucleic acid molecules produced by these methods, to fragments or derivatives thereof, and to vectors and host cells comprising such nucleic acid molecules, fragments, or derivatives. The invention also relates to the use of such nucleic acid molecules to produce desired polypeptides. The invention also concerns kits comprising the compositions or compounds of the invention.
2. Related Art
Genomic DNA
In examining the structure and physiology of an organism, tissue or cell, it is often desirable to determine its genetic content. The genetic framework (i.e., the genome) of an organism is encoded in the double-stranded sequence of nucleotide bases in the deoxyribonucleic acid (DNA) which is contained in the somatic and germ cells of the organism. The genetic content of a particular segment of DNA, or gene, is only manifested upon production of the protein which the gene ultimately encodes. In order to produce a protein, a complementary copy of one strand of the DNA double helix (the xe2x80x9csensexe2x80x9d strand) is produced by polymerase enzymes, resulting in a specific sequence of messenger ribonucleic acid (mRNA). This mRNA is then translated by the protein synthesis machinery of the cell, resulting in the production of the particular protein encoded by the gene. There are additional sequences in the genome that do not encode a protein (i.e., xe2x80x9cnoncodingxe2x80x9d regions) which may serve a structural, regulatory, or unknown function. Thus, the genome of an organism or cell is the complete collection of protein-encoding genes together with intervening noncoding DNA sequences. Importantly, each somatic cell of a multicellular organism contains the full complement of genomic DNA of the organism, except in cases of focal infections or cancers, where one or more xenogeneic DNA sequences may be inserted into the genomic DNA of specific cells and not into other, non-infected, cells in the organism. As noted below, however, the expression of the genes making up the genomic DNA may vary between individual cells.
cDNA and cDNA Libraries
Within a given cell, tissue or organism, there exist myriad mRNA species, each encoding a separate and specific protein. This fact provides a powerful tool to investigators interested in studying genetic expression in a tissue or cellxe2x80x94mRNA molecules may be isolated and further manipulated by various molecular biological techniques, thereby allowing the elucidation of the fill functional genetic content of a cell, tissue or organism.
One common approach to the study of gene expression is the production of complementary DNA (cDNA) clones. In this technique, the mRNA molecules from an organism are isolated from an extract of the cells or tissues of the organism. This isolation often employs solid chromatography matrices, such as cellulose or hydroxyapatite, to which oligomers of deoxythytnidine (dT) have been complexed. Since the 3xe2x80x2 termini on all eukaryotic mRNA molecules contain a string of deoxyadenosine (dA) bases, and since dA binds to dT, the mRNA molecules can be rapidly purified from other molecules and substances in the tissue or cell extract. From these purified mRNA molecules, cDNA copies may be made using the enzyme reverse transcriptase, which results in the production of single-stranded cDNA molecules. The single-stranded cDNAs may then be converted into a complete double-stranded DNA copy of the original mRNA (and thus of the original double-stranded DNA sequence, encoding this mRNA, contained in the genome of the organism) by the action of a DNA polymerase. The protein-specific double-stranded cDNAs can then be inserted into a plasmid, which is then introduced into a host bacterial cell. The bacterial cells are then grown in culture media, resulting in a population of bacterial cells containing (or in many cases, expressing) the gene of interest.
This entire process, from isolation of mRNA to insertion of the cDNA into a plasmid to growth of bacterial populations containing the isolated gene, is termed xe2x80x9ccDNA cloning.xe2x80x9d If cDNAs are prepared from a number of different mRNAs, the resulting set of cDNAs is called a xe2x80x9ccDNA library,xe2x80x9d representing the different functional (i.e., expressed) genes present in the source cell, tissue or organism. Genotypic analysis of these cDNA libraries can yield much information on the structure and function of the organisms from which they were derived.
DNA Amplification
In order to increase the copy number of, or xe2x80x9camplify,xe2x80x9d specific sequences of DNA in a sample, investigators have relied on a number of amplification techniques. A commonly used amplification technique is the Polymerase Chain Reaction (xe2x80x9cPCRxe2x80x9d) method described by Mullis and colleagues (U.S. Pat. Nos. 4,683,195; 4,683,202; and 4,800,159). This method uses xe2x80x9cprimerxe2x80x9d sequences which are complementary to opposing regions on the DNA sequence to be amplified. These primers are added to the DNA target sample, along with a molar excess of nucleotide bases and a DNA polymerase (e.g., Taq polymerase), and the primers bind to their target via base-specific binding interactions (i.e., adenine binds to thymine, cytosine to guanine). By repeatedly passing the reaction mixture through cycles of increasing and decreasing temperatures (to allow dissociation of the two DNA strands on the target sequence, synthesis of complementary copies of each strand by the polymerase, and re-annealing of the new complementary strands), the copy number of a particular sequence of DNA may be rapidly increased.
Other techniques for amplification of target nucleic acid sequences have also been developed. For example, Walker et al. (U.S. Pat. No. 5,455,166; EP 0 684 315) described a method called Strand Displacement Amplification (SDA), which differs from PCR in that it operates at a single temperature and uses a polymerase/endonuclease combination of enzymes to generate single-stranded fragments of the target DNA sequence, which then serve as templates for the production of complementary DNA (cDNA) strands. An alternative amplification procedure, termed Nucleic Acid Sequence-Based Amplification (NASBA) was disclosed by Davey et al. (U.S. Pat. No. 5,409,818; EP 0 329 822). Similar to SDA, NASBA employs an isothermal reaction, but is based on the use of RNA primers for amplification rather than DNA primers as in PCR or SDA. Another known amplification procedure includes Promoter Ligation Activated Transcriptase (LAT) described by Berninger et al. (U.S. Pat. No. 5,194,370).
PCR-based DNA Fingerprinting
Despite the availability of a variety of amplification techniques, most DNA fingerprinting methods rely on PCR for amplification, taking advantage of the well-characterized protocols and automation available for this technique. Examples of these PCR-based fingerprinting techniques include Random Amplified Polymorphic DNA (RAPD) analysis (Williams, J. G. K. et al., Nucl. Acids Res. 18(22):6531-6535 (1990)), Arbitrarily Primed PCR (AP-PCR; Welsh, J., and McClelland, M., Nucl. Acids Res. 18(24):7213-7218 (1990)), DNA Amplification Fingerprinting (DAF; Caetano-Anollxc3xa9s et al., Bio/Technology 9:553-557 (1991)), and microsatellite PCR or Directed Amplification of Minisatellite-region DNA (DAMD; Heath, D. D. et al., Nucl. Acids Res. 21(24):5782-5785 (1993)). All of these methods are based on the amplification of random DNA fragments by PCR, using arbitrarily chosen primers.
DNA Sequencing
In general, two techniques have been traditionally used to sequence nucleic acids. In the first method, termed xe2x80x9cMaxam and Gilbert sequencingxe2x80x9d after its co-developers (Maxam, A. M. and Gilbert, W., Proc. Natl. Acad. Sci. USA 74:560-564, 1977), DNA is radiolabeled, divided into four samples and treated with chemicals that selectively destroy specific nucleotides bases in the DNA and cleave the molecule at the sites of damage. By separating the resultant fragments into discrete bands by gel electrophoresis and exposing the gel to X-ray film, the sequence of the original DNA molecule can be read from the film. This technique has been used to determine the sequences of certain complex DNA molecules, including the primate virus SV40 (Fiers, W., et al., Nature 273:113-120, 1978; Reddy, V. B., et al., Science 200:494-502, 1978) and the bacterial plasmid pBR322 (Sutcliffe, G., Cold Spring Harbor Symp. Quant. Biol. 43:444-448, 1975).
An alternative technique for sequencing, named xe2x80x9cSanger sequencingxe2x80x9d after its developer (Sanger, F., and Coulson, A. R., J. Mol. Biol. 94:444,448, 1975), is more commonly employed. This method uses the DNA-synthesizing activity of DNA polymerases which, when combined with mixtures of reaction-terminating dideoxynucleoside triphosphates (Sanger, F., et al., Proc. Natl. Acad. Sci. USA 74:5463-5467, 1977) and a short primer (either of which may be detectably labeled), gives rise to a series of newly synthesized DNA fragments specifically terminated at one of the four dideoxy bases. These fragments are then resolved by gel electrophoresis and the sequence determined as described for Maxam and Gilbert sequencing above. By carrying out four separate reactions (once with each ddNTP), the sequences of even fairly complex DNA molecules may rapidly be determined (Sanger, F., et al., Nature 265:678695, 1977; Barnes, W., Meth. Enzymol. 152:538-556, 1987). While Sanger sequencing usually employs E. coli or T7 DNA polymerase (U.S. Pat. No. 4,795,699), recent modifications of this technique using T7 polymerase mutants allow sequencing to be accomplished using a single sequencing reaction containing all four chain-terminating ddNTPs at different concentrations (U.S. Pat. Nos. 4,962,020 and 5,173,411). Further modifications to the technique, to reduce or eliminate the buildup of reaction-poisoning pyrophosphate in the reaction mixtures, have also been described (U.S. Pat. No. 5,498,523). Other variations for sequencing nucleic acid molecules have also been described (see Murray, Nucl. Acids. Res. 17:8889, 1989; and Craxton, Methods: A Comparison to Methods in Enzymology, 3:20-25, 1991).
Limitations
As noted above, the faithful and high-fidelity copying of a template nucleic acid molecule is an essential step in the synthesis of a nucleic acid molecule in amplification, reverse transcription, and sequencing protocols. However, the use of standard compositions and protocols to accomplish this synthesis is often inefficient, in that they tend to terminate nucleic acid synthesis prematurely at certain secondary structural (Gerard, G. F., et al., FOCUS 11(4):60 (1989); Myers, T. W., and Gelfand, D. H., Biochemistry 30:7661 (1991)) and sequence (Messer, L. I., et al., Virol. 146:146 (1985)); Abbotts, J., et al., J. Biol Chem. 268:10312-10323 (1993)) barriers in nucleic acid templates. This is particularly true for template sequences that have high guanine/cytosine content (i.e., xe2x80x9cGC-richxe2x80x9d templates) and those that are fairly large in size (i.e., templates that are larger than about 3-5 kb in length). These secondary structural and sequence barriers in the template nucleic acid molecules occur frequently at homopolymer stretches (Messer, L. I., et al., Virol. 146:146 (1985); Huber, H. E., et al., J. Biol. Chem. 264:4669-4678 (1989); Myers, T. W., and Gelfand, D. H., Biochemistry 30:7661 (1991)) and are more often sequence rather than secondary structural barriers (Abbotts, J., e al., J. Biol. Chem. 268:10312-10323 (1993)). If these barriers could be overcome, yield of total and full-length nucleic acid products in synthesis reactions could be increased.
Some reports have indicated that modulation of the ionic strength or osmolality of the reaction mixtures, particularly of the concentration of Na+ and K+ ions, may influence the secondary structure and condensation of nucleic acids in vitro much as they do in vivo (Le Rudulier, D., et al., Science 224:1064 (1984); Buche, A., et al., J. Biomolec. Struct. Dyn. 8(3):601 (1990); Marquet, R., and Houssier, C., J. Biomolec. Struct. Dyn. 9(1):159 (1991); Buche, A., et al., J. Biomolec. Struct. Dyn. 11(1):95 (1993); Woodford, K., et al., Nucl. Acids Res. 23(3):539 (1995); Flock, S., et al., Biophys. J. 70:1456 (1996); Flock, S., et al., Biophys. J. 71:1519(1996); EP 0 821 059 A2). In some of these studies, in vitro nucleic acid conformation and stability was found to be improved in buffer solutions containing any of a number of natural and synthetic osmoprotectant compounds, including polysaccharides such as trehalose (Carninci, P., et al., Proc. Natl. Acad Sci. USA 95:520-524 (1998)), certain co-solvents such as glycerol and dimethylsulfoxide (Varadaraj, K., and Skinner, D. M., Gene 140:1 (1994)); glycine and derivatives thereof(Buche, A., et al., FEBS Lett. 247(2):367 (1989); Flock, S., et al., J. Biomolec. Struct. Dyn. 13(1):87 (1995); Houssier, C., et al., Comp. Biochem. Physiol. 117A(3):313 (1997)); low molecular weight amines such as beta-alanine, asparagine and cystamine (Kondakova, N. V., et al., Mol. Biol. (Moscow) 9(5):742 (1975), Aslanian, V. M., et al., Biofizika 29(4)564 (1984)); and other nitrogen-containing compounds and amino acids such as proline, betaine and ectoine (Rees, W. A., et al., Biochemistry 32:137-144 (1993); WO 95/20682; DE 44 11 588 C1; DE 44 11 594 C1; Mytelka, D. S., et al., Nucl. Acids Res. 24(14):2774 (1996); Baskaran, N., et al., Genome Res. 6:633 (1996); Weissensteiner, T., and Lanchbury, J. S., BioTechniques 21(6):1102 (1996); Rajendrakumar, C. S. V., et al., FEBS Letts. 410:201-205 (1997); Henke, W., et al., Nucl. Acids Res. 25(19):3957 (1997); Hengen, P. N., TIBS 22:225 (1997)). Betaine and ectoine are natural osmoprotectants in a variety of bacterial and animal cells (Chambers, S. T., et al., J. Bacteriol. 169(10):4845 (1987); Randall, K., et al., Biochim. Biophys. Acta 1291(3):189 (1996); Randall, K., et al., Biochem. Cell Biol. 74(2):283 (1996); Malin, G., and Lapidot, A., J. Bacteriol. 178(2):385 (1996); Gouesbet, G., et al., J. Bacteriol. 178(2):447 (1996); Cxc3xa1novas, D., et al., J. Bacteriol. 178(24):7221 (1996); Cxc3xa1novas , D., et al., J. Biol. Chem. 272(41):25794-25801 (1997).
There remains a need in the art; however, for compounds, compositions and methods that are useful in enhancing synthesis of nucleic acid molecules, particularly those that are GC-rich and/or those that are relatively large.
The present invention relates generally to compounds, compositions and methods useful in enhancing synthesis of nucleic acid molecules, especially from GC-rich nucleic acid templates. In one aspect, the invention relates to compounds and compositions for use in synthesizing a nucleic acid molecule, particularly for template mediated synthesis such as in amplification, reverse transcription, and sequencing reactions. The compounds and compositions of the invention comprise one or more compounds having a chemical formula selected from the group consisting of formula I and formula II, and salts and derivatives thereof. In a preferred aspect, the compounds used in the invention include any amino acid, any saccharide (monosaccharide or polysaccharide), any polyalcohol, or salts or derivatives thereof The compounds or compositions of the invention include compounds having the chemical formula as set forth in formula I or formula II, or salts or derivatives thereof, wherein the aryl group is selected from the group consisting of phenyl, naphthyl, phenanthryl, anthracyl, indenyl, azulenyl, biphenyl, biphenylenyl and fluorenyl groups; wherein the halo group is selected from the group consisting of fluorine, chlorine, bromine and iodine; wherein the alkyl group is selected from the group consisting of methyl, ethyl, propyl, isopropyl, butyl, pentyl, hexyl, heptyl, octyl, nonyl, and decyl, and may be a branched chain alkyl group; wherein the alkenyl group is selected from the group consisting of ethenyl, propenyl, butenyl, pentenyl, hexenyl, heptenyl, octenyl, nonenyl and decenyl, and may be a branched chain alkenyl group; wherein the alkynyl group is selected from the group consisting of ethynyl, propynyl, butynyl, pentynyl, hexynyl, heptynyl, octynyl, nonynyl and decynyl, and may be a branched chain alkynyl group; and wherein the lower alkoxy (ether) group is oxygen substituted by one of the alkyl groups mentioned above. The invention also relates to salts and derivatives of such compounds. In particularly preferred aspect of the invention, the compounds are selected from the group consisting of 4-methylmorpholine N-oxide, betaine, carnitine, ectoine, proline, glycine, pipecolic acid, trimethylamine N-oxide, N-alkylimidazole compounds such as 1-methylimidazole or 4-methylimidazole, poly(2-ethyl-2-oxazoline) of average molecular weight about 50,000 to about 500,000 daltons, poly(diallyldimethylammonium chloride) of average molecular weight about 100,000 to about 200,000 daltons, or salts or derivatives thereof The invention also relates to compositions which comprise the compounds of the invention and one or more additional components selected from the group consisting of (i) one or more enzymes having nucleic acid polymerase activity, which may be thermostable enzymes, (ii) one or more nucleotides, (ii) one or more buffering salts, and (iv) one or more nucleic acid molecules. Preferred such enzymes according to this aspect of the invention may include a DNA polymerase (such as Taq, Tne, Tma, Pfu, VENT(trademark), DEEPVENT(trademark) and Tth DNA polymerases, and mutants, variants and derivatives thereof), an RNA polymerase (such as SP6, T7 or T3 RNA polymerase and mutants, variants and derivatives thereof) and a reverse transcriptase (such as M-MLV reverse transcriptase, RSV reverse transcriptase, AMV reverse transcriptase, RAV reverse transcriptase, MAV reverse transcriptase and HIV reverse transcriptase and mutants, variants and derivatives thereof). Preferably such reverse transcriptases are reduced or substantially reduced in RNase H activity.
The invention also relates to methods for synthesizing a nucleic acid molecule, comprising (a) mixing a nucleic acid template (which may be a DNA molecule such as a cDNA molecule, or an RNA molecule such as a mRNA molecule) with one or more (preferably two or more, three or more, four or more, five or more etc.) of the compounds or compositions of the invention to form a mixture; and (b) incubating the mixture under conditions sufficient to make a first nucleic acid molecule complementary to all or a portion of the template. Such methods of the invention may optionally comprise one or more additional steps, such as incubating the above-described first nucleic acid molecule under conditions sufficient to make a second nucleic acid molecule complementary to all or a portion of the first nucleic acid molecule. The invention also relates to nucleic acid molecules made by these methods, to vectors (which may be expression vectors) comprising these nucleic acid molecules, and to host cells comprising these nucleic acid molecules or vectors. The invention also relates to methods of producing a polypeptide, comprising culturing the above-described host cells under conditions favoring the production of the polypeptide by the host cells, and isolating the polypeptide. The invention also relates to polypeptides produced by such methods.
The invention also relates to methods for amplifying a nucleic acid molecule comprising (a) mixing a nucleic acid template with one or more of the compounds or compositions of the invention to form a mixture and (b) incubating the mixture under conditions sufficient to amplify a nucleic acid molecule complementary to all or a portion of the template. More specifically, the invention relates to a method of amplifying a DNA molecule comprising:
(a) providing a first and second primer, wherein said first primer is complementary to a sequence at or near the 3xe2x80x2-termini of the first strand of said DNA molecule and said second primer is a complementary to a sequence at or near the 3xe2x80x2-termini of the second strand of said DNA molecule;
(b) hybridizing said first primer to said first strand and said second primer to said second strand in the presence of one or more compounds or compositions of the invention, under conditions such that a third DNA molecule complementary to said first strand and a fourth DNA molecule complementary to said second strand are synthesized;
(c) denaturing said first and third strand, and said second and fourth strands; and
(d) repeating steps (a) to (c) one or more times.
Such conditions may include incubation in the presence of one or more polymerases, one or more nucleotides and/or one or more buffering salts. The invention also relates to nucleic acid molecules amplified by these methods.
The invention also relates to methods for sequencing a nucleic acid molecule comprising (a) mixing a nucleic acid molecule to be sequenced with one or more primers, one or more of the compounds or compositions of the invention, one or more nucleotides and one or more terminating agents to form a mixture; (b) incubating the mixture under conditions sufficient to synthesize a population of molecules complementary to all or a portion of the molecule to be sequenced; and (c) separating the population to determine the nucleotide sequence of all or a portion of the molecule to be sequenced. The invention more specifically relates to a method of sequencing a DNA molecule, comprising:
(a) hybridizing a primer to a first DNA molecule;
(b) contacting said molecule of step (a) with deoxyribonucleoside triphosphates, one or more compounds or compositions of the invention, and one or more terminator nucleotides;
(c) incubating the mixture of step (b) under conditions sufficient to synthesize a random population of DNA molecules complementary to said first DNA molecule, wherein said synthesized DNA molecules are shorter in length than said first DNA molecule and wherein said synthesized DNA molecules comprise a terminator nucleotide at their 3xe2x80x2 termini; and
(d) separating said synthesized DNA molecules by size so that at least a part of the nucleotide sequence of said first DNA molecule can be determined.
Such terminator nucleotides include ddNTP, ddATP, ddGTP, ddITP or ddCTP. Such conditions may include incubation in the presence of one or more DNA polymerases and/or buffering salts.
The invention also relates to kits for use in synthesis of a nucleic acid molecule, comprising one or more containers containing one or more of the compounds or compositions of the invention. These kits of the invention may optionally comprise one or more additional components selected from the group consisting of one or more nucleotides, one or more polymerases and/or reverse transcriptases, a suitable buffer, one or more primers and one or more terminating agents (such as one or more dideoxynucleotides).
Other preferred embodiments of the present invention will be apparent to one of ordinary skill in light of the following drawings and description of the invention, and of the claims.