The availability of synthetic DNA sequences has fueled major revolutions in genetic engineering and the understanding of human genes, making possible such techniques as site-directed mutagenesis, the polymerase chain reaction (PCR), high-throughput DNA sequencing, gene synthesis, and gene expression analysis using DNA microarrays.
DNA produced from a user-specified sequence is typically synthesized chemically in the form of short oligonucleotides, often ranging in length from 20 to 70 bases. For methods and materials known in the art related to the chemical synthesis of nucleic acids see, e.g., Beaucage, S. L., Caruthers, M. H., The Chemical Synthesis of DNA/RNA, which is hereby incorporated by reference. Syntheses of longer oligonucleotides are possible, but the intrinsic error rate of each coupling step (typically 1-2%) is such that preparations of longer oligonucleotides are increasingly likely to be riddled with errors, and that the pure desired product will be numerically overwhelmed by sequences containing errors. Thus to produce longer DNA sequences, the molecule is not synthesized as a single long piece. Rather, current methods involve combining many shorter oligonucleotides to build the larger desired sequence, a process often referred to as “gene synthesis” (though the product need not be confined to a single gene).
Linear synthesis of nucleic acids may be accomplished using biological molecules and protecting groups. The most common linear synthesis techniques are based on solid-phase phosphoramidite chemistry. The 3′-phosphate is affixed to solid-phase support (typically controlled-pore glass beads, silicon substrates, or glass substrates), and an individual nucleotide of choice is added to a chain growing in the 3′-5′ direction by means of a 5′-protecting group (typically an acid-labile or photo-cleavable protecting group).
In linear syntheses based on phosphoramidite chemistry, there are many potential sources of sequence error and oligonucleotide damage that are well documented. Most notably, the removal of the 5′-protecting group usually involves an acidic treatment that can remove the base, or in the case of photo-labile 5′-protecting group, require ultraviolet irradiation that can damage the nucleotide. The nucleotide may fail to incorporate into the growing strand because of insufficient reaction time. Nearly all organic and inorganic solvents and reagents employed in the process can chemically damage the growing nucleotide. Such sources of error ultimately limit the fidelity and length of the oligonucleotide, and furthermore, limit the fidelity and length of larger nucleic acids assembled from linearly synthesized strands. For methods and materials known in the art related to phosphoramidite nucleic acid synthesis see, e.g., Sierzchala, A. B., Dellinger, D. J., Betley, J. R., Wyrzykiewicz, Yamada, C. M., Caruthers, M. H., Solid-Phase Oligodeoxynucleotide Synthesis: A Two-Step Cycle Using Peroxy Anion Deprotection, J. AM. CHEM. SOC., 125, 13427-13441 (2003), which is hereby incorporated by reference.
Errors in gene synthesis are typically controlled in two ways: 1) the individual oligonucleotides can each be purified to remove error sequences; 2) the final cloned products are sequenced to discover if errors are present. In this latter case, the errors are dealt with by either sequencing many clones until an error-free sequence is found, using mutagenesis to specifically fix an error, or choosing and combining specific error-free sub-sequences to build an error free full length sequence.
Synthesizing a single gene has become commonplace enough that many companies exist to perform this task for a researcher. Single genes up to about 1000 base pairs (bp) are typically offered, and larger sequences are feasible, up to about 10,000 bp, for the construction of a single large gene, or a set of genes together. A recent benchmark was the production of the entire poliovirus genome, 7500 bp, capable of producing functional viral particles. These syntheses of long DNA products employ the methods described above, often aided by the large-scale production of oligonucleotides, such as with mutiplexed 48-, 96- or 384-column synthesizers, and using sample-handling robots to speed manipulations. For methods and materials known in the art related to gene synthesis, see e.g., Au., L., Yang, W., Lo., S., Kao, C., Gene Synthesis by a LCR-Based Approach High-Level Production of Leptin-L45 Using Synthetic Gene in Escherichia Coli, BIOCHEM. & BIOPHYS. RESEARCH COMM., 248, 200-203 (1998); Baedeker, M., Schulz, G. E., Overexpression of a Designed 2.2 kb Gene of Eukaryotic Phenylalanine Ammonia-Lyase in Escherichia coli, FEBS LETTERS 475, 57-60 (1999), Casimiro, D. R., Wright, P. E., Dyson, H. J., PCR-based Gene Synthesis and Protein NMR Spectroscopy, STRUCTURE, Vol. 5, No. 11, 1407-1412 (1997); Cello, J., Paul, A. V., Wimmer, E., Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template, SCIENCE, 297, 1016-1018 (2002); Kneidinger, B., Graninger, M., Messner, P., Scaling Up the Ligase Chain Reaction-Based Approach to Gene Synthesis, BIOTECHNIQUES, 30, 249-252 (2001); Dietrich, R., Wirsching, F., Opitz, T., Schwienhorst, A., Gene Assembly Based on Blunt-Ended Double-Stranded DNA-Molecules, BIOTECH. TECHNIQUES, Vol. 12, No. 1, 49-54 (1998); Hoover, D. M., Lubkowski, J., DNAWorks: An Automated Method for Designing Oligonucleotides for PCR-based Gene Synthesis, NUCLEIC ACIDS RESEARCH, Vol. 30, No. 10, 1-7 (2002); Stemmer, W. P. C., Crameri, A., Ha, K. D., Brennan, T. M., Heyneker, H. L., Single-Step Assembly of a Gene and Entire Plasmid from Large Numbers of Oligodeoxyribonucleotides, GENE, 164, 49-53 (1995); Withers-Martinez, C., Carpenter, E. P., Hackett, F., Ely, B., Sajid, M., Grainger, M., Blackman, M. J., PCR-Based Gene Synthesis as an Efficient Approach for Expression of the A+T-Rich Malaria Genome, PROTEIN ENG., Vol. 12, No. 12, 1113-1120 (1999); and Venter Cooks Up a Synthetic Genome in Record Time, SCIENCE, 302, 1307 (2003) all of which are hereby incorporated by reference. For patents and patent applications related to gene synthesis, see e.g., U.S. Pat. Nos. 6,521,453 and 6,521,427, and U.S. Pat. App. Pub. Nos. 20030165946, 20030138782, and 20030087238, all hereby incorporated by reference.
As the goals of genetic engineers become more complex and larger in scale, these methods become prohibitive in terms of the cost, time, and effort involved to produce longer sequences and correct the subsequent errors. For example, a fee may be $5 per by for a 500 bp sequence, with a waiting time of 2-4 weeks, whereas even the most rapid portion of the poliovirus synthesis required several months and tens of thousands of dollars (the project overall required two years and over $100,000). A technology which makes this process both faster and more affordable would be a tremendous aid to researchers in need of very long DNA molecules.
Some examples of work which would benefit:
1) Vaccine trials (modest DNA length, but many variants): in producing proteins for use in vaccine trials, a large number of variant protein sequences are often examined. The number of options explored is typically limited by the number of variants that can be produced. The lengths of the DNA molecules encoding such proteins might be in the range of about 100 bp to about 2000 bp, or longer, depending on the protein. One of ordinary skill in the art will understand that the length of a DNA molecule may vary greatly depending on the protein product desired.
2) Gene therapy (intermediate DNA length): retroviral vectors used for gene therapy might range from about 20,000 to about 50,000 bp. The process of constructing these vectors also limits the number and complexity of variants which can be tested in the laboratory.
3) Bacterial engineering (greatest DNA length, genomic synthesis): currently, changes made to a bacterial organism are attempted one gene at a time, a painstaking process when several changes are desired. In the case of engineering a bacterium to perform a task, such as waste detoxification or protein production, a large number of intricate changes may be required. If the complete genome of the desired bacterium could be generated easily de novo, a great deal of time and effort could be saved, and new areas of research would be made possible. Bacterial genomes range from several hundred kilobases to many megabases. One of ordinary skill in the art will understand that the size of bacterial genomes varies greatly depending on the bacterium in question.
The fundamental challenges of the current technology:
1) Scaling: as the size of the desired sequence grows, the production time and costs involved grow linearly, or worse. An ideal method would involve smaller amounts of reagents, shorter cycle times for oligonucleotide synthesis, a greatly improved parallelization of the synthesis process used to provide the oligonucleotides, and/or an improved process for the assembly of oligonucleotides into larger molecules.
2) Errors: with the production of larger DNA sequences, expected per base error rates will essentially guarantee that conventional methods will yield sequences containing errors. These errors will require more effective techniques than the current control procedures described above.