Proteins act in biological systems in many ways; as catalysts, transport agents, hormones, cell surface receptors, electron carriers, antibodies and markers of individuality. In general, the study of protein function focuses on a protein that occurs naturally though some minor chemical modifications of the native proteins are possible The molecular basis of function could be studied in much greater detail if one could vary at will the protein structure in order to test predictions that result from a hypothetical model of its action. These insights would also provide a rational basis for designing specific proteins, with novel and useful properties not previously available. For these reasons, the ability to generate a protein with any desired structure represents an important goal.
The amino acid sequence of a protein uniquely determines its three-dimensional structure and function. The amino acid sequence is, in turn, determined by the sequence of bases in the DNA of the structural gene that encodes the protein. In vitro mutagenesis techniques and methods for efficient oligodeoxyribonucleotide synthesis have been recently developed. These advances, along with other techniques of molecular biology, now allow the creation of a protein with any desired amino acid sequence. This process involves preparation of the appropriate gene, either by total synthesis or specific mutation of a naturally occurring gene, followed by expression of this gene in an appropriate microbiological host.
Various approaches to the synthesis of genes have been proposed. Genes consist of double-stranded DNA molecules, whose chemical structure is basically like that of a ladder. The two strands of DNA molecules adhere to one another because the units, called nucleotides, that make up the strands are mutually attracted to one another by their complementary chemical forms.
Over a period of some 20 years, a series of interactions between genetics, biochemistry and microbiology has led to the development of a new technology. This technology has made possible the transfer of a gene or a small cluster of genes, on a segment of DNA from almost any organism to one of the standard and easily grown laboratory organisms; the most conspicuous organism being the bacterium Escherichia coli. A host of supplementary techniques permit the regulation of the expression of the transferred genes so that the proteins they specify may be synthesized very efficiently in the bacterium. This allows the protein to be produced cheaply and abundantly. In order to produce desired proteins using this recombinant DNA technology, one must either isolate or synthesize the gene that encodes that particular protein.
The advantage of total synthesis is the opportunity to engineer desired features into the DNA such as: restriction sites, regulatory signals for transcription or translation, usage of the most abundant tRNA codons for a given organism The first gene synthesis was carried out by Khorana and coworkers in the 1960s with the yeast alanine tRNA gene (Khorana, H. et al. (1971) Studies on polynucleotides: total synthesis of the structural gene for an alanine transfer ribonucleic acid from yeast. J.Mol.Bio. 72, 209-217 and accompanying papers.) The key concept in Khorana's work is the inherent ability of DNA to base pair. Also a major factor to his success was the discovery of DNA joining enzymes. Khorana formulated the following three step approach, Khorana, H. G. (1979) Total synthesis of a gene. Science 203, 614-625: (a) chemical synthesis of short oligodeoxynucleotides, (b) enzymatic phosphorylation of 5'OH end-groups to monitor joining, and (c) ligase-catalyzed joining of hydrogen bonded duplexes.
Several genes have since been synthesized using this approach, some examples of which are the following: In 1977, Riggs, Itakura and coworkers synthesized the gene for somatostatin, Itakura, K., Tadaaki, H., Crea, R., Riggs, A. D., Heyneker, H. L., Bolivar, F. and Boyer, H. W. (1977) Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin Science 198, 1056-1063, and fused this to the gene for .beta.-galactosidase in the plasmid pBR322, Bolivar, F., Rodriguez, R. L., Green, P.J., Betlach, M. C., Heyneker, H. L., Boyer, H. W. (1977) Construction and characterization of new cloning vehicles. II. A multipurpose cloning system. Gene 2, 93-113, Sutcliffe, J. C. (1978) Nucleotide sequence of the ampicillin resistance gene of Escherichia coli plasmid pBR322. Proc. Natl. Acad. Sci. USA 75, 3737-3741, Maniatis, T., Fritsh, E. F. and Sambrook, Jr. (1982) Molecular cloning: A laboratory manual. Cold Spring Harbor Laboratory, New York. This represents the first recovery of a functional polypeptide product from chemically synthesized DNA. The synthesis required for this project consisted of eight oligodeoxynucleotides with five base complementary overlaps for efficient oligodeoxynucleotide joining.
Two of the longest genes to be synthesized include human leukocyte interferon .alpha.-1, Edge, M. D., Greene, A. R., Gillian, H. R., Meacock, P. A., Schuch, W., Scanlon, D. B., Atkinson, T. C., Newton, C. R. and Markham, A. F. (1981) Total synthesis of a human leukocyte interferon gene. Nature 292, 756-762, which is 514 base pairs (166 amino acids) and bovine rhodopsin, Gerretti, L., Karnik, S. S., Khorana, H. G., Nassai, M. and Oprian, D. D. (1986) Total synthesis of a gene for bovine rhodopsin. Proc. Natl. Acad. Sci. USA 33, 599-603, which is 1057 base pairs (348 amino acids). These are sections of DNA which include initiation and termination codons and restriction enzyme sites for insertion into a plasmid. The synthesis for human leukocyte interferon .alpha.-1 requires 67 oligonucleotides with an average length of 15 nucleotide residues. The synthesis for bovine rhodopsin required 72 synthetic oligonucleotides with average lengths of 15-40 nucleotide residues An alternate approach was developed by Itakura, Rossi and coworkers for the synthesis of a 132 base pair segment coding for amino acids 126-stop of human leukocyte interferon .alpha.-2, Rossi, J. J., Kierzek, R., Huang, T., Walker, P. A. and Itakura, K. (1982) An alternate method for synthesis of double-standed DNA segments. J. Biol Chem. 257, 9226-9229. This method involves synthesis of oligonucleotides which are annealed to form partial duplex structures These structures are then used as a substrate for DNA polymerase I (Klenow), McHenry, C. and Kornberg, A. (1977) DNA polymerase III holoenzyme of Escherichia coli: purification and resolution into subunits. J. Mol. Biol. 252, 6478-6484, and the four deoxynucleoside triphosphates. These segments are then digested with appropriate restriction endonucleases for insertion into the plasmid and the final step is blunt end ligation to close the plasmid. This approach reduces the number of synthetic oligonucleotides required, however, with the introduction of automated DNA synthesis this is no longer a major concern. Two of the most recent examples of synthetic genes are those which code for the human complement fragment C.sub.5a and Calmodulin, Roberts, D. M. Crea, R., Malecha, M., Alvarado-Urbina, G., Chiarello, R. H. and Watterson, D. M. (1985) Chemical synthesis and expression of a calmodulin gene designed for site-specific mutagenesis. Biochemistry 24, 5090-5098.
Chemists attempting to build long DNA molecules of hundreds of units have, until now, synthesized short stretches of singlestranded DNA that correspond to pieces of the "ladder rails." Each of these rails was designed so that it complemented an opposing segment, but with an extra piece extending beyond that segment. This extended piece complemented another extended piece of another pair of rails When all the segments were mixed together, they tended to form into double-stranded DNA with the desired sequence Once joined, the rails were stitched together using DNA-joining enzymes.
Using this puzzle-piece approach, genetic engineers have been able to join up to 14 such segments at once, before the level of misjoining became too high.
The present invention represents an enormously flexible, infinitely expandable, completely controllable approach to the design of new genes; it allows even beginners to easily build large DNA segments.
A new method to facilitate construction of long strands of DNA has been developed.
According to our invention, there has been developed a way of reliably building large genes chunk by chunk from the outside in. In this method, there is first joined a small stretch of the desired gene--comprising the beginning and end pieces--to a large circular piece of DNA called a vector or plasmid. Between these beginning and end pieces are "restriction sites" where the pieces can be cut apart using enzymes. Vectors are specially built pieces of DNA, widely used in genetic engineering, that can carry attached DNA into a living cell, where the cell can be induced to make many copies of the DNA. After using bacteria to make copies of the plasmid vector carrying the gene segment, the scientists extract the copies, and chemically snip the inserted DNA, separating the beginning and end pieces. Between these pieces, they insert another segment of the desired gene, representing the next inward two segments of the desired gene, with restriction sites between them. Once more the vector is inserted into bacteria to copy the resulting longer stretch of DNA. The process of cutting, inserting, and copying continues until the desired gene has been produced. There is no limit to the size of predetermined gene structure that this synthetic strategy will allow. Accordingly, it is to be anticipated that this invention will find important utilization by those skilled in this art.