The present invention is generally directed toward improving the sequence fidelity of synthetic double-stranded oligonucleotides. It is more particularly related to the removal of synthetic failures (including side products and truncated products) created in the synthesis of oligonucleotides, such as double-stranded DNA.
Much of the discovery research in pharmaceutical companies is focused on genes, either as targets for drug development or as therapeutics in the form of their protein expression products. These companies have access to a majority of the human genes. Pharmaceutical companies are overwhelmed with potential opportunities, acutely aware that their competitors are looking at the same set of possibilities, and currently unable to work on more than a fraction of the genes that have been identified. One of the major bottlenecks in this research is the time and effort required to prepare genes for detailed analysis.
Gene synthesis, the production of cloned genes partially or entirely from chemically synthesized DNA, is one method of overcoming this bottleneck. In principle, gene synthesis can provide rapid access to any gene for which the sequence is known and to any variation on a gene. Reliable, cost-effective automated gene synthesis would have a revolutionary effect on the process of biomedical research by speeding up the manipulation and analysis of new genes.
One principal factor limiting the automation of gene synthesis is the low sequence fidelity of the process: gene clones created from chemically synthesized DNA often contain sequence errors. These errors can be introduced at many stages of the process: during chemical synthesis of the component oligonucleotides, during enzymatic assembly of the double-stranded oligonucleotides, and by chemical damage occurring during the manipulation and isolation of the DNA or during the cloning process.
Four types of base modifications are commonly produced when an oligonucleotide is synthesized using the phosphoramidite method: (1) Transamination of the O6-oxygen of deoxyguanosine to form a 2,6-diaminopurine residue; (2) Deamination of the N4-amine of deoxycytidine to form a uridine residue (Eadie, J. S. and Davidson, D. S., Nucleic Acids Res. 15:8333, 1987); (3) Depurination of N6-benzoyldeoxyadenosine yielding an apurinic site (Shaller, H. and Khorana, H. G., J. Am. Chem. Soc. 85:3828, 1963; Matteucci, M. D. and Caruthers, M. H., J. Am. Chem. Soc. 103:3185, 1981); (4) Incomplete removal of the N2-isobutyrlamide protecting group on deoxyguanosine. Each of these side products (byproducts) can contribute to sequence errors in cloned synthetic DNA.
Another synthetic failure of oligonucleotide synthesis is the formation of truncated products that are less than the full length of the desired oligonucleotide. The solid phase approach to oligonucleotide synthesis involves building an oligomer chain that is anchored to a solid support through its 3xe2x80x2-hydroxyl group, and is elongated by coupling to its 5xe2x80x2-hydroxyl group. The yield of each coupling step in a given chain-elongation cycle will generally be less than 100%. For an oligonucleotide of length xe2x80x98nxe2x80x99, there are nxe2x88x921 linkages and the maximum yield of a desired coupling will be [coupling efficiency]nxe2x88x921. For a 25-mer, assuming a coupling efficiency of 98%, the calculated yield of full-length product will be 61%. The other 39% consists of all possible shorter length oligonucleotides (truncated products) resulting from inefficient monomer coupling. The desired oligonucleotide can be partially purified from this mixture by purification steps using ion exchange or reverse phase chromatography. These purification procedures are not 100% effective and do not completely eliminate these populations. The final product therefore contains nxe2x88x921 and to some extent nxe2x88x922 and nxe2x88x923 failure sequences. This type of undesired product of the oligonucleotide synthesis process can also contribute to sequence errors in synthetic genes.
Another class of synthetic failures is the formation of xe2x80x9cn+xe2x80x9d products that are longer than the full length of the desired oligonucleotide (User Bulletin 13, 1987, Applied Biosystems). The primary source of these products is branching of the growing oligonucleotide, in which a phosphoramidite monomer reacts through the bases, especially the N-6 of adenosine and the O-6 of guanosine. Another source of n+ products is the initiation and propagation from unwanted reactive sites on the solid support. Finally, these products also form if the 5xe2x80x2-trityl protecting group is inadvertently deprotected during the coupling step. This premature exposure of the 5xe2x80x2-hydroxyl allows for a double addition of a phosphoramidite. This type of synthetic failure of the oligonucleotide synthesis process can also contribute to sequence errors in synthetic genes.
Another process common to the preparation of synthetic genes is the ligation of synthetic double-stranded oligonucleotides to other synthetic double-stranded oligonucleotides to form larger synthetic double-stranded oligonucleotides. In vitro experiments have shown that T4 DNA ligase exhibits poor fidelity, sealing nicks with 3xe2x80x2 and 5xe2x80x2 A/A or T/T mismatches (Wu, D. Y., and Wallace, R. B., Gene 76:245-54, 1989), 5xe2x80x2 G/T mismatches (Harada, K. and Orgel, L. Nucleic Acids Res. 21:2287-91, 1993) or 3xe2x80x2 C/A, C/T, T/G, T/T, T/C, A/C, G/G or G/T mismatches (Landegren, U., Kaiser, R., Sanders, J., and Hood, L., Science 241:1077-80, 1988). These types of mismatches may occur during ligation of double-stranded nucleic acids into larger double-stranded nucleic acids.
Due to the difficulties in the current approaches to the preparation of oligonucleotides, such as genes, there is a need in the art for methods for improving the sequence fidelity of synthetic oligonucleotides. The present invention fills this need, and further provides other related advantages.
Briefly stated, the present invention provides a variety of methods for improving the sequence fidelity of synthetic double-stranded oligonucleotides. The methods comprise subjecting synthetic double-stranded oligonucleotides to preparative column chromatography or preparative gel chromatography under denaturing conditions sufficient to separate the synthetic double-stranded oligonucleotides into two populations, wherein one population is enriched for synthetic failures and the other population is depleted of synthetic failures. In one embodiment, the column chromatography is HPLC. A preferred embodiment is DHPLC. In another embodiment, the gel chromatography is gradient gel chromatography. In any of the embodiments, the oligonucleotides may comprise synthetic double-stranded DNA. Preferred synthetic double-stranded DNA comprises one or more fragments of a larger DNA molecule.
These and other aspects of the present invention will become evident upon reference to the following detailed description. In addition, various references are set forth herein. Each of these references is incorporated herein by reference in its entirety as if each was individually noted for incorporation.
Prior to setting forth the invention, it may be helpful to an understanding thereof to set forth definitions of certain terms to be used hereinafter.
Natural bases of DNAxe2x80x94adenine (A), guanine (G), cytosine (C) and thymine (T). In RNA, thymine is replaced by uracil (U).
Synthetic double-stranded oligonucleotidesxe2x80x94substantially double-stranded DNA composed of single strands of oligonucleotides produced by chemical synthesis or by the ligation of synthetic double-stranded oligonucleotides to other synthetic double-stranded oligonucleotides to form larger synthetic double-stranded oligonucleotides.
Synthetic failuresxe2x80x94undesired products of oligonucleotide synthesis; such as side products, truncated products or products from incorrect ligation.
Side productsxe2x80x94chemical byproducts of oligonucleotide synthesis.
Truncated productsxe2x80x94all possible shorter than the desired length oligonucleotide, e.g., resulting from inefficient monomer coupling during synthesis of oligonucleotides.
TExe2x80x94an aqueous solution of 10 mM Tris and 1 mM EDTA, at a pH of 8.0.
Homoduplex oligonucleotidesxe2x80x94double-stranded oligonucleotides wherein the bases are fully matched; e.g., for DNA, each A is paired with a T, and each C is paired with a G.
Heteroduplex oligonucleotidesxe2x80x94double-stranded oligonucleotides wherein the bases are mispaired, i.e., there are one or more mismatched bases; e.g., for DNA, an A is paired with a C, G or A, or a C is paired with a C, T or A, etc.
The present invention is directed toward methods that provide for double-stranded oligonucleotides with a reduced sequence error rate from a mixture of synthetic oligonucleotides. The methods are based on the use of techniques in a preparative mode under conditions sufficient to separate double-stranded oligonucleotides which contain synthetic failures (including side products and truncated products) from the desired length double-stranded oligonucleotides that contain completely matched natural bases.
More specifically, the disclosure of the present invention shows surprisingly that a population of synthetic double-stranded oligonucleotides can be separated into two populations by methodologies when utilized in a preparative mode under denaturing conditions. One population is enriched for oligonucleotides containing synthetic failures (e.g., side products, products from incorrect ligation and/or truncated products). A second population is depleted of oligonucleotides containing synthetic failures and is enriched for synthetic double-stranded oligonucleotides of a desired length which contain only matched natural bases. Depletion of synthetic failures from the desired double-stranded oligonucleotides refers generally to at least about a two-fold depletion relative to the total population prior to separation. Typically, the depletion will be a change of about two-fold to three-fold from the original state. The particular fold depletion may be the result of a single separation or the cumulative result of a plurality of separations. The second population is useful, for example, where the oligonucleotides are double-stranded DNA which correspond to a gene or fragments of a gene.
As disclosed herein, synthetic molecules containing natural bases can be separated from those containing synthetic failures, e.g., unnatural bases or truncated sequences. Unnatural bases in double-stranded oligonucleotides, like mismatched bases of heteroduplexed oligonucleotides, destabilize the double-stranded oligonucleotides. Double-stranded oligonucleotides (such as double-stranded DNA) containing unnatural bases or being less than full length, melt at a lower temperature than sequences of full length containing only natural bases in a homoduplex. By adjusting the temperature, double-stranded synthetic oligonucleotide failures will melt or partially melt, and migrate differently on chromatography than synthetic homoduplex oligonucleotides of full length. Thus, various methodologies, such as column chromatography or gel chromatography, can be used in a preparative manner under denaturing conditions to separate synthetic failures from the desired synthetic double-stranded oligonucleotides.
Oligonucleotide synthesis (e.g., chemical synthesis) can generate a variety of side products. For example, side products include an abasic residue (e.g., an apurinic or apyrimidinic residue), diaminopurine, an incompletely deprotected G, and uridine. For purposes of the present invention, the common feature of the side products is that these unnatural bases destabilize the double-stranded oligonucleotides in which they are incorporated, such that these synthetic failures melt at a lower temperature than synthetic double-stranded oligonucleotides containing only natural bases.
Denaturing conditions can be applied to a variety of methodologies used or adapted for preparative (rather than analytical) purposes, including chromatography. Column chromatography and gel chromatography are examples of suitable methodologies within the present invention. In one embodiment, the column chromatography is high performance liquid chromatography (xe2x80x9cHPLCxe2x80x9d). In another embodiment, the column chromatography uses a monolithic matrix as described by Hatch in U.S. Pat. No. 6,238,565. In another embodiment, the column chromatography is xe2x80x9cDenaturing Anion-Exchange HPLCxe2x80x9d (DEAHPLC) as described by Taylor in WO 01/27331 A2. In another embodiment, the column chromatography is Isocratic HPLC as described by Gjerde in U.S. Pat. No. 6,024,878. In another embodiment, the column chromatography is xe2x80x9cFully Denaturing HPLCxe2x80x9d (FDHPLC). A preferred embodiment is use of a technique termed xe2x80x9cdenaturing HPLCxe2x80x9d (xe2x80x9cDHPLCxe2x80x9d). In another embodiment, the chromatography is gradient gel chromatography. As used herein, denaturing conditions refer to both partially denaturing conditions under which oligonucleotides are partially denatured, and fully denaturing conditions under which oligonucleotides are fully denatured. Partially denaturing refers to the separation of a mismatched base pair in a double-stranded oligonucleotide while a portion or all of the remainder of the double strand remains intact. This occurs because a double strand will denature more easily (e.g., at a lower temperature) at the site of a base pair mismatch than is required to denature the remainder of the strand.
Oligonucleotides suitable for use in the present invention are any double-stranded sequence. Preferred oligonucleotides are double-stranded DNA. Double-stranded DNA includes full length genes and fragments of full length genes. For example, the DNA fragments may be portions of a gene that when joined form a larger portion of the gene or the entire gene.
The separation by DHPLC of synthetic double-stranded DNA fragments containing only natural bases, from synthesis side products is described as a representative example of the present invention. DHPLC is an analytical technique that has been used to detect mutations that occur in DNA isolated from natural sources. The technique detects polymorphisms in genomic DNA after PCR amplification. The technique is performed as follows. A test sample is formed by PCR amplifying the region of interest in the genomic DNA. This test sample is mixed with an amplified control sample obtained from DNA without a polymorhpism. This mixture of the test and control samples is denatured and renatured to form duplexes composed of amplified strands from both samples. This test mixture is then analyzed by DHPLC. Oefner and his colleagues have described two variations of DHPLC: the first in which the separation is done under partially denaturing conditions (Oefner, P. J., Underhill, P. A. (1998) Detection of Nucleic Acid Heteroduplex Molecules by Denaturing High-Performance Liquid Chromatography and Methods for Comparative Sequencing, U.S. Pat. No. 5,795,976, and Oefner, P. J., Underhill, P. A. (1998) DNA mutation detection using denaturing high-performance liquid chromatography, Current Protocols in Human Genetics, Wiley and Sons, N.Y., Supplement 19, 7.10.1-7.10.12) and a second version in which the DNA molecules are fully denatured (Oefner, J. Chromatogr. B. Biomed. Sci. Appl. 739(2):345-355, 2000). In the present invention, it was discovered that DHPLC can be used as a preparative technique to enrich a population synthetic DNA fragments for molecules which do not contain synthetic side products. Double-stranded DNA fragments in the 15 base pair to 10,000 base pair range are typically produced during chemical synthesis of large DNA fragments. Within the present invention, these intermediates are subjected to preparative DHPLC (using an automated system such as the ProStar Helix HPLC system from Varian Inc., Walnut Creek, Calif.) under conditions sufficient to isolate a population of high purity fragments of synthetic DNA and thus reduce the sequence error rate.
Each fragment is analyzed using software (e.g., DHPLC Melt Program, Stanford University, Palo Alto, Calif.; WAVEMAKER(trademark) Utility Software, Transgenomic, Inc., Omaha, Nebr.; computer method described by Altshuler, U.S. Pat. No. 6,197,516) to calculate a specific run condition (e.g., temperature and gradient conditions) sufficient for depleting or initiating depletion of synthetic failures from the desired double-stranded oligonucleotide population. The fragments are injected onto the HPLC and run under the specified conditions. It will be evident to those of ordinary skill in the art that adjustments (e.g., a change of a few degrees of temperature) may be made to optimize the conditions for a particular fragment. The major peak is collected and dried down to remove solvents, then used to continue the assembly of the gene. Synthetic side products, for example, will fail to base pair with the intended complementary natural bases. DNA sequences containing side products will thus have a lowered melting point and show altered mobility under these conditions. The DNA molecules in the major peak all have the same melting profile and are less likely to carry synthetic side products.
DHPLC can be readily automated and can provide a high-throughput method of physically reducing synthetic side products from a chemically synthesized DNA sample. For example, synthetic DNA fragments of less than 1000 bp in length are injected onto the column under conditions that partially denature the DNA, the major peak collected and the remainder of the HPLC flow-through discarded. The peak contains the DNA fragment; most of the molecules in the original population which carry synthetic side-products in place of natural bases show altered mobility and thus will be discarded. Alternatively, synthetic DNA fragments of less than 100 bp in length are injected into the column under conditions that fully denature the DNA strands. The two major peaks are collected and the remainder of the HPLC flow-through discarded. Each of the two peaks contains one strand of the synthetic DNA; most of the molecules in the original population which carry synthetic side products instead of natural bases show altered mobility and thus will be discarded. The two peaks are combined and hybridized together to form an intermediate fragment for gene synthesis which is less likely to carry synthetic side products and is thus more likely to yield the desired sequence when it is cloned.
As mentioned above, the chromatography is performed under conditions appropriate to separatively deplete the synthetic failures from the desired double-stranded DNA. In one embodiment, the thermal and gradient conditions are adjusted to permit separation by DHPLC. The thermal and gradient conditions may be calculated using a DHPLC Melt Program available from Stanford University, Palo Alto, Calif. (http://insertion.stanford.edu/melt.html). Each double-stranded DNA denatures at a temperature that is a function of the strength of the duplex structure. A fully natural base paired DNA sequence forms the most stable duplex and denatures under the most stringent conditions. DNA sequences with base modifications form less stable duplexes, denature at a lower temperature and thus show increased mobility at a given temperature and gradient profile.
Gel based techniques such as double-stranded conformational analysis (DSCA) and capillary-based conformation-sensitive gel electrophoresis (capillary CSGE) can also be used to enrich the abundance of correct sequence in a population of nucleic acid sequences. Like DHPLC, these gel based methods are analytical techniques that have been used to detect mutations based upon the conformation in the double strand caused by a non-matching base pairs. These techniques rely on the differing electrophoretic mobility of a heteroduplex from the homoduplex. Several other mutation detection techniques based upon slab gels [e.g., constant gradient gel electrophoresis (CGGE), denaturing gradient gel electrophoresis (DGGE), and temperature gradient gel electrophoresis (TGGE)] are based on the subtle differences of melting points of DNA fragments dependent on base pair composition and the resultant difference of mobility of the mutant fragment in gels. The separated populations of double-stranded nucleic acids can be isolated by excision of bands from the gel.
Capillary CSGE is based upon capillary electrophoresis (Rozycka M, Collins N, Stratton M R, Wooster R., Genomics 70(1):34-40, 2000). Like DSCA, this technique relies on conformational differences between heteroduplex and homoduplex nucleic acids. For CSGE, fractions containing size or shape fractionated DNA fragments can be collected on moving affinity membranes or into sample chambers. The exact timing of the collection steps is achieved by determining the velocity of each individual zone measured between two detection points near the end of the capillary.
A preferred use of the present invention is for chemical gene synthesis by enriching fractions for double-stranded DNA fragments which contain only natural bases. Such fragments are joined (e.g., ligated) to form the complete gene.
The following examples are offered by way of illustration and not by way of limitation.