Genetic information is encoded on double-stranded deoxyribonucleic acid ("DNA" or "genes") according to the order in which the DNA coding strand presents the characteristic bases of its repeating nucleotide components. "Expression" of the encoded information to form polypeptides involves a two-part process. According to the dictates of certain control regions ("regulons") in the gene, RNA polymerase may be caused to move along the coding strand, forming messenger RNA (ribonucleic acid) in a process called "transcription." In a subsequent "translation" step the cell's ribosomes in conjunction with transfer RNA convert the mRNA "message" into polypeptides. Included in the information mRNA transcribes from DNA are signals for the start and termination of ribosomal translation, as well as the identity and sequence of the amino acids which make up the polypeptide. The DNA coding strand comprises long sequences of nucleotide triplets called "codons" because the characteristic bases of the nucleotides in each triplet or codon encode specific bits of information. For example, 3 nucleotides read as ATG (adenine-thymine-guanine) result in an mRNA signal interpreted as "start translation", while termination codons TAG, TAA and TGA are interpreted "stop translation". Between the start and stop codons lie the so-called structural gene, whose codons define the amino acid sequence ultimately translated. That definition proceeds according to the well-established "genetic code" (e.g., J. D. Watson, Molecular Biology of the Gene W. A. Benjamin Inc., N. Y., 3rd ed. 1976) which describes the codons for the various amino acids. The genetic code is degenerate in the sense that different codons may yield the same amino acid, but precise in that for each amino acid there are one or more codons for it and no other. Thus, for example, all of the codons TTT, TTC, TTA and TTG, when read as such, encode for serine and no other amino acid. During translation the proper reading phase or reading frame must be maintained. Consider for example what happens when the ribosome reads different bases as the beginning of a codon (underlined) in the sequence . . . GCTGGTTGTAAG . . . : EQU . . . GCT GGT TGT AAG . . . .fwdarw. . . . Ala-Gly-Cys-Lys . . . EQU . . . G CTG GTT GTA AG . . . .fwdarw. . . . Leu-Val-Leu . . . EQU . . . GC TGG TTG TAA A . . . .fwdarw. . . . Trp-Leu-(STOP).
The polypeptide ultimately produced, then, depends vitally upon the spatial relationship of the structural gene with respect to the regulon.
A clearer understanding of the process of genetic expression will emerge once certain components of genes are defined:
Operon--A gene comprising structural gene(s) for polypeptide expression and the control region ("regulon") which regulates that expression.
Promoter--A gene within the regulon to which RNA polymerase must bind for initiation of transcription.
Operator--A gene to which repressor protein may bind, thus preventing RNA polymerase binding on the adjacent promoter.
Inducer--A substance which deactivates repressor protein, freeing the operator and permitting RNA polymerase to bind to promoter and commence transcription.
Catabolite Activator Protein ("CAP") Binding Site--A gene which binds cyclic adenoisine monophosphate ("c AMP")--mediated CAP, also commonly required for initiation of trancription. The CAP binding site may in particular cases be unnecessary. For example, a promoter mutation in the lactose operon of the phage
plac UV5 eliminates the requirement for cAMP and CAP expression. J. Beckwith et al, J. Mol. Biol 69, ISS-160 (1972).
Promoter-Operator System--As used herein, an operable control region of an operon, with or without respect to its inclusion of a CAP binding site or capacity to code for repressor protein expression.
Further by way of definition, and for use in the discussion of recombinant DNA which follows, we define the following:
Cloning Vehicle--Non-chromosomal double stranded DNA comprising an intact "replicon" such that the vehicle is replicated, when placed within a unicellular organism ("microbe") by a process of "transformation". An organism so transformed is called a "transformant".
Plasmid--For present purposes, a cloning vehicle derived from viruses or bacteria, the latter being "bacterial plasmids."
Complementarity--A property conferred by the base sequences of single strand DNA which permits the formation of double stranded DNA through hydrogen bonding between complementary bases on the respective strands. Adenine (A) complements thymine (T), while guanine (G) complements cytosine (C).
Advances in biochemistry in recent years have led to the construction of "recombinant" cloning vehicles in which, for example, plasmids are made to contain exogenous DNA. In particular instances the recombinant may include "heterologous" DNA, by which is meant DNA that codes for polypeptides ordinarily not produced by the organism susceptible to transformation by the recombinant vehicle. Thus, plasmids are cleaved to provide linear DNA having ligatable termini. These are bound to an exogenous gene having ligatable termini to provide a biologically functional moiety with an intact replicon and a desired phenotypical property. The recombianant moiety is inserted into a microorganism by transformation and transformants are isolated and cloned, with the object of obtaining large populations capable of expressing the new genetic information. Methods and means of forming recombinaant cloning vehicles and transforming organisms with them have been widely reported in the literature. See, e.g., H. L. Heynecker et al, Nature 263, 748-752 (1976); Cohen et al, Proc. Nat. Acad. Sci. USA 69, 2110 (1972); ibid., 70, 1293 (1973); ibid., 70, 3240 (1973); ibid., 71, 1030 (1974); Morrow et al, Proc. Nat. Acad. Sci. U.S.A. 71, 1743 (1974); Novick, Bacteriological Rev., 33, 210 (1969); Hershfield et al, Proc. Soc. Nat'l. Acad. Sci. U.S.A. 71, 3455 (1974) and Jackson et al, ibid. 69, 2904 (1972). A generalized discussion of the subject appears in S. Cohen, Scientific American 233, 24 (1975). These and other publications alluded to herein are incorporated by reference.
A variety of techniques are available for DNA recombination, according to which adjoining ends of separate DNA fragments are tailored in one way or another to facilitate ligation. The latter term refers to the formation of phosphodiester bonds between adjoining nucleotides, most often through the agency of the enzyme T4 DNA ligase. Thus, blunt ends may be directly ligated. Alternatively, fragments containing complementary single strands at their adjoining ends are advantaged by hydrogen bonding which positions the respective ends for subsequent ligation. Such single strands, referred to as cohesive termini, may be formed by the addition of nucleotides to blunt ends using terminal transferase, and sometimes simply by chewing back one strand of a blunt end with an enzyme such .lambda.-exonuclease. Again, and most commonly, resort may be had to restriction endonucleases, which cleave phosphodiester bonds in and around unique sequences of nucleotides of about 4-6 base pairs in length. Many restriction endonucleases and their recognition sites are known, the socalled Eco RI endonucleases being most widely employed. Restriction endonucleases which cleave double-stranded DNA at rotationally symmetric "palindromes" leave cohesive termini. Thus, a plasmid or other cloning vehicle may be cleaved, leaving termini each comprising half the restriction endonuclease recognition site. A cleavage product of exogenous DNA obtained with the same restriction endonuclease will have ends complementary to those of the plasmid termini. Alternatively, as disclosed infra, synthetic DNA comprising cohesive termini may be provided for insertion into the cleaved vehicle. To discourage rejoinder of the vehicles' cohesive termini pending insertion of exogenous DNA, the terminii can be digested with alkaline phosphatase, providing molecular selection for closures incorporating the exogenous fragment. Incorporation of a fragment having the proper orientation relative to other aspects of the vehicle may be enhanced when the fragment supplants vehicle DNA excised by two different restriction endonucleases, and itself comprises termini respectively constituting half the recognition sequence of the different endonucleases.
Despite wide-ranging work in recent years in recombinant DNA research, few results susceptible to immediate and practical application have emerged. This has proven especially so in the case of failed attempts to express polypeptides and the like coded for by "synthetic DNA", whether constructed nucleotide by nucleotide in the conventional fashion or obtained by reverse transcription from isolated mRNA (complementary or "cDNA"). In this application we describe what appears to represent the first expression of a functional polypeptide product from a synthetic gene, together with related developments which promise wide-spread application. The product referred to is somatostatin (Guillemin U.S. Pat. No. 3,904,594), an inhibitor of the secretion of growth hormone, insulin and glucagon whose effects suggest its application in the treatment of acromegaly, acute pancreatitis and insulin-dependent diabetes. See R. Guillemin et al, Annual Rev. Med. 27 379 (1976). The somatostatin model clearly demonstrates the applicability of the new developments described here on numerous and beneficial fronts, as will appear from the accompanying drawings and more clearly from the detailed description which follows.