1. Field of the Invention
The present application is generally related to the synthesis of DNA molecules, and more particularly, to the synthesis of a synthetic gene or other DNA sequence.
2. Description of the Related Art
Proteins are an important class of biological molecules that have a wide range of valuable medical, pharmaceutical, industrial, and biological applications. A gene encodes the information necessary to produce a protein according to the genetic code by using three nucleotides (one codon or set of codons) for each amino acid in the protein. An expression vector contains DNA sequences that allow transcription of the gene into mRNA for translation into a protein.
It is often desirable to obtain a synthetic DNA which encodes the protein of interest. DNA can be synthesized accurately in short pieces, say 50 to 80 nucleotides or less. Pieces substantially longer than this become problematic due to cumulative error probability in the synthesis process. Most genes are appreciably longer than 50 to 80 nucleotides, usually by hundreds or thousands of nucleotides. Consequently, direct synthesis is not a convenient method for producing large genes. Currently, large synthetic genes with a desired DNA sequence are manufactured by any one of several methods:
1. If the gene does not contain introns it can be synthesized by PCR directly from genomic DNA. This is feasible for genes of bacteria, lower eukaryotes, and many viruses, but nearly all genes of higher organisms contain introns.
2. A related alternative is to PCR the gene from a full-length cDNA clone. It is time consuming and tedious to isolate and characterize a full-length clone, and full-length cDNA clones are available for only a very small fraction of the genes of any higher organism.
3. Based on the gene sequence inferred from the genomic DNA sequence, short DNA segments of both strands of a gene can be synthesized with overlapping ends. These segments are allowed to anneal and are joined together with DNA ligase. Annealing efficiency and accuracy at the segment junctions is often poor, resulting in low yields.
4. An approach to reduce this problem is to build the gene up in subsections in a step-wise manner. This remains time-consuming, expensive, tedious, and inefficient, because many reactions must be performed.
5. Based on the gene sequence inferred from the genomic DNA sequence, short overlapping duplex DNA segments of the gene can be synthesized that contain compatible end-proximal restriction endonuclease sites. Each fragment can be cut with the appropriate enzyme, annealed, and joined with DNA ligase. In addition to the limitations above, both strands of the gene sequence must be synthesized and this method is dependent on the placement of appropriate restriction sites evenly spaced throughout the gene sequence.
6. Genes have also been assembled by overlap extension of partially overlapping oligonucleotides using DNA-polymerase-catalyzed reactions. The gene is divided into oligonucleotides, each of which partially overlaps and is complementary to the adjacent oligonucleotide(s). The oligonucleotides are allowed to anneal and the resulting DNA construct is extended to the full-length double-stranded gene. See, for example, W. P. C. Stemmer et al. “Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides” Gene, 1995, 164, 49-53 and D. E. Casimiro et al. “PCR-based gene synthesis and protein NMR spectroscopy” Structure, 1997, 5, 1407-1412, the disclosures of which are incorporated by reference. Designing the oligonucleotides for gene synthesis by this approach has recently been automated as described in D. M. Hoover & J. Lubkowski “DNA Works: an automated method for designing oligonucleotides for PCR-based gene synthesis” Nucleic Acids Res., 2002, 30:10 e43, the disclosure of which is incorporated by reference. The method optimizes codon usage, optionally removes DNA hairpins, and uses a nearest-neighbor model of DNA melting to achieve homogeneous target melting temperatures. The process of removing local DNA hairpins and dimerization from a single DNA oligonucleotide is also referred to by those skilled in the art as “removing DNA secondary structure.” The methods described in these references do not globally optimize a melting temperature gap between correct hybridizations and incorrect hybridizations, however.