The invention herein provides means for the production of peptide hormones such as adrenocorticotropin (ACTH), endorphin, .alpha.- and .beta.-melanocyte-stimulating hormone (.alpha.-MSH, .beta.-MSH), .beta.-lipotropin (.beta.-LPH) and corticotropin-like intermedite lobe peptide (CLIP). These peptides have in common the fact that their synthesis in the body is coded by a single gene. Isolation of this gene, or portions thereof corresponding to one or more of the peptides for which it codes, enables the production of the desired peptides by in vitro or by microbiological systems. The invention is exemplified by the cloning of a deoxynucleotide sequence coding for endorphin.
Research results from several laboratories have established that the mammalian brain contains specific receptors which are the binding sites of opiate drugs. Recently, it has been shown that the normal brain contains certain peptides which specifically bind to the opiate receptors. These peptides are sometimes termed "endogenous opiates", in recognition of their role in normal brain physiology and of the similarity of their biological activity to that of such opium alkaloids as morphine. The name "endorphin" has been given to this class of peptides.
Various endorphins have been isolated and characterized. The largest is .beta.-endorphin, having thirty-one amino acids in the following sequence: Tyr-Gly-Gly-Phe-Met-Thr-Ser-Glu-Lys-Ser-Gln-Thr-Pro-Leu-Val-Thr-Leu-Phe-Ly s-Asn-Ala-Ile-Ile-Lys-Asn-Ala-His-Lys-Lys-Gly-Gln. (All peptide sequences herein begin with the N-terminal amino acid on the left and continue to the C-terminal amino acid on the right.) The .alpha.-, .gamma.- and .delta.-endorphins are shorter subsequences of .beta.-endorphin, having, respectively, the first sixteen, seventeen and twenty-seven amino acids, beginning at the amino-terminus of .beta.-endorphin. All peptides in this series have in common the sequence: Tyr-Gly-Gly-Phe-Met, termed Met.sup.5 -enkephalin. Met.sup.5 -enkephalin has been separately isolated and shown to have morphine-like activity, which is naloxone-reversible. Met.sup.5 -enkephalin is the shortest sequence known to have opiod activity. Removal of the carboxy-terminal methionine results in complete loss of activity. A variant, Leu.sup.5 -enkephalin, is also active. The enkephalin moiety is considered to be the primary functional grouping conferring opiod activity on the endorphin molecule, while the effect of additional C-terminal amino acids primarily affects the rate of transport and duration of action of the peptide. For a general review, see Guillemin, R., Science 202, 390 (1978).
The .beta.-endorphin amino acid sequence is included within a larger peptide, .beta.-lipotropin (.beta.-LPH), which lacks opiod activity, and has been previously isolated and characterized (Li, C.H. and Chung, D., Proc.Nat.Acad. Sci USA 73, 1145 (1976). ACTH is well known as a hormone which regulates the activity of the adrenal cortex. The subsequence of ACTH, termed, CLIP, comprising amino acids 15-39 of ACTH, has been shown experimentally to affect memory retention. The melanocyte-stimulating hormones stimulate pigment formation in the skin. Studies on stressed animals have revealed that .beta.-endorphin and adrenocorticotropin (ACTH) concentrations in blood plasma increase at comparable rates following application of stress. More recently it was shown that both ACTH and .beta.-LPH (containing the .beta.-endorphin sequence), as well as .alpha.- and .beta.-melanocyte-stimulating hormone (MSH) and a sequence of unknown function are initially synthesized as a single precursor protein having a molecular weight of approximately 28,500. (Roberts, J. L. and Herbert, E., Proc.Nat.Acad.Sci USA 74, 4826 (1977) (hereinafter cited as Roberts, J. L., et al., no. 1); Roberts, J. L. and Herbert, E., Proc.Nat.Acad.Sci USA 74, 5300 (1977) (hereinafter cited as Roberts, J. L., et al., no.2); Mains, et al., Proc.Nat.Acad.Sci USA 74, 3014 (1977)). The relative positions of the sequences of these peptides in the precursor peptide, hereinafter termed the ACTH/LPH precursor, are shown in FIG. 1.
The ACTH/LPH precursor is a central factor in normal physiological homeostasis. Normal maintenance functions are regulated by the peptide hormones comprising its amino acid sequence, and these hormones contribute to the normal sense of well-being of a healthy individual. The emerging picture also casts the ACTH/LPH precursor protein in the role of a "stress package" comprising segments capable of regulating behavioral, emotional and physiological responses to stress, selectively or in combination, depending upon the specific manner in which the precursor is ultimately cleaved. The ability to make adequate quantities of the entire precursor or its individual components is highly desirable for the therapy of stress-related diseases, for treatment of pain and for the management of psychosomatic illness.
It is clear from previous work that ACTH and endorphins exist only in very small amounts. Although they can be isolated from slaughterhouse material, the amounts are so minute that the purified material would be prohibitively expensive for therapeutic use. Similarly their length renders chemical synthesis excessively costly, using conventional methods. On the other hand, the use of recombinant DNA technology will enable the prctical production of ACTH and endorphins in sufficient quantity and at acceptable cost.
Developments in recombinant DNA technology have made it possible to isolate specific genes or portions thereof from higher organisms, such as man and other mammals, and to transfer the genes or fragments to a microorganism species, such as bacteria or yeast. The transferred gene is replicated and propagated as the transformed microorganism replicates. As a result, the transformed microorganism may become endowed with the capacity to make whatever protein the gene or fragment encodes, whether it be an enzyme, a hormone, an antigen or an antibody, or a portion thereof. The microorganism passes on this capability to its progeny, so that in effect, the transfer has resulted in a new strain, having the described capability. See, for example, Ullrich, A. et al., Science 196, 1313 (1977), and Seeburg, P. H., et al., Nature 270, 486 (1977). A basic fact underlying the application of this technology for practical purposes is that DNA of all living organisms, from microbes to man, is chemically similar, being composed of the same four nucleotides. The significant differences lie in the sequences of these nucleotides in the polymeric DNA molecule. The nucleotide sequences are used to specify the amino acid sequences of proteins that comprise the organism. Although most of the proteins of different organisms differ from each other, the coding relationship between nucleotide sequence and amino acid sequence is fundamentally the same for all organisms. For example, the same nucleotide sequence which codes for the amino acid sequence of ACTH in human pituitary cells, will, when transferred to a microorganism, be recognized as coding for the same amino acid sequence.
Abbreviations used herein are given in Table 1.
TABLE 1 ______________________________________ DNA- deoxyribonucleic acid A-Adenine RNA- ribonucleic acid T-Thymine cDNA- complementary DNA G-Guanine (enzymatically synthesized C-Cytosine from an mRNA sequence) U-Uracil mRNA- messenger RNA ATP-adenosine triphosphate dATP- deoxyadenosine triphos- TTP-thymidine triphosphate phate dGTP- deoxyguanosine triphos- phate dCTP- deoxycytidine triphos- phate ______________________________________
The coding relationships between nucleotide sequence in DNA and amino acid sequence in protein are collectively known as the genetic code, shown in Table 2.
TABLE 2 ______________________________________ Genetic Code Phenylalanine(Phe) TTK Histidine(His) CAK Leucine(Leu) XTY Glutamine(Gln) CAJ Isoleucine(Ile) ATM Asparagine(Asn) AAK Methionine(Met) ATG Lysine(Lys) AAJ Valine(Val) GTL Aspartic acid(Asp) GAK Serine(Ser) QRS Glutamic acid(Glu) GAJ Proline(Pro) CCL Cysteine(Cys) TGK Threonine(Thr) ACL Tryptophan(Try) TGG Alanine(Ala) GCL Arginine(Arg) WGZ Tyrosine(Tyr) TAK Glycine(Gly) GGL Termination signal TAJ Termination signal TGA ______________________________________
Key: Each 3-letter triplet represents a trinucleotide of mRNA, having a 5' end on the left and a 3' end on the right. The letters stand for the purine or pyrimidine bases forming the nucleotide sequence.
______________________________________ A = adenine J = A or G G = guanine K = T or C C =cytosine L = A, T, C or G T = thymine M = A, C or T X = T or C if Y is A or G X = C if Y is C or T Y = A, G, C or T if X is C Y = A or G if X is T W = C or A if Z is A or G W = C if Z is C or T Z = A, G, C or T if W is C Z = A or G if W is A QR = TC if S is A, G, C or T QR = AG if S is T or C S = A, G, C or T if QR is TC S = T or C if QR is AG ______________________________________
An important feature of the code, for present purposes, is the fact that each amino acid is specified by a trinucleotide sequence, also known as a nucleotide triplet. The phosphodiester bonds joining adjacent triplets are chemically indistinguishable from all other internucleotide bonds in DNA. Therefore the nucleotide sequence cannot be read to code for a unique amino acid sequence without additional information to determine the reading frame, which is the term used to denote the grouping of triplets used by the cell in decoding the genetic message.
Many recombinant DNA techniques employ two classes of compounds, transfer vectors and restriction enzymes, to be discussed in turn. A transfer vector is a DNA molecule which contains, inter alia, genetic information which insures its own replication when transferred to a host microorganism strain. Examples of transfer vectors commonly used in bacterial genetics are plasmids and the DNA of certain bacteriophages. Although plasmids have been used as the transfer vectors for the work described herein, it will be understood that other types of transfer vector may be employed. Plasmid is the term applied to any autonomously replicating DNA unit which might be found in a microbial cell, other than the genome of the host cell itself. A plasmid is not genetically linked to the chromosome of the host cell. Plasmid DNA's exist as doublestranded ring structures generally on the order of a few million daltons molecular weight, although some are greater than 10.sup.8 daltons in molecular weight. They usually represent only a small percent of the total DNA of the cell. Transfer vector DNA is usually separable from host cell DNA by virtue of the great difference in size between them. Transfer vectors carry genetic information enabling them to replicate within the host cell, in some cases independently of the rate of host cell division. Some plasmids have the property that their replication rate can be controlled by the investigator by variations in the growth conditions. Plasmid DNA exists as a closed ring. However, by appropriate techniques, the ring may be opened, a fragment of heterologous DNA inserted, and the ring reclosed, forming an enlarged molecule comprising the inserted DNA segment. Bacteriophage DNA may carry a segment of heterologous DNA inserted in place of certain non-essential phage genes. Either way, the transfer vector serves as a carrier or vector for an inserted fragment of heterologous DNA.
Transfer is accomplished by a process known as transformation. During transformation, bacterial cells mixed with plasmid DNA incorporate entire plasmid molecules into the cells. Although the mechanics of the process remain obscure, it is possible to maximize the proportion of bacterial cells capable of taking up plasmid DNA and hence of being transformed, by certain empirically determined treatments. Once a cell has incorporated a plasmid, the latter is replicated within the cell and the plasmid replicas are distributed to the daughter cells when the cell divides. Any genetic information contained in the nucleotide sequence of the plasmid DNA can, in principle, be expressed in the host cell. Typically, a transformed host cell is recognized by its acquisition of traits carried on the plasmid, such as resistance to certain antibiotics. Different plasmids are recognizable by the different capabilities or combination of capabilities which they confer upon the host cell containing them. Any given plasmid may be made in quantity by growing a pure culture of cells containing the plasmid and isolating the plasmid DNA therefrom.
Restriction endonucleases are hydrolytic enzymes capable of catalyzing site-specific cleavage of DNA molecules. The locus of restriction endonuclease action is determined by the existence of a specific nucleotide sequence. Such a sequence is termed the recognition site for the restriction endonuclease. Restriction endonucleases from a variety of sources have been isolated and characterized in terms of the nucleotide sequence of their recognition sites. Some restriction endonucleases hydrolyze the phosphodiester bonds on both strands at the same point, producing blunt ends. Others catalyze hydrolysis of bonds separated by a few nucleotides from each other, producing free single stranded regions at each end of the cleaved molecule. Such single stranded ends are self-complementary, hence cohesive, and may be used to rejoin the hydrolyzed DNA. Since any DNA susceptible of cleavage by such an enzyme must contain the same recognition site, the same cohesive ends will be produced, so that it is possible to join heterologous sequences of DNA which have been treated with restriction endonuclease to other sequences similarly treated. See Roberts, R. J., Crit.Rev.Biochem. 4, 123 (1976). Restriction sites are relatively rare, however the general utility of restriction endonucleases has been greatly amplified by the chemical synthesis of double stranded oligonucleotides bearing the restriction site sequence. Therefore virtually any segment of DNA can be coupled to any other segment simply by attaching the appropriate restriction oligonucleotide to the ends of the molecule, and subjecting the product to the hydrolytic action of the appropriate restriction endonuclease, thereby producing the requisite cohesive ends. See Heyneker, H. L., et al., Nature 263, 748 (1976) and Scheller, R. H., et al., Science 196, 177 (1977). An important feature of the distribution of restriction endonuclease recognition sites is the fact that they are randomly distributed with respect to reading frame. Consequently, cleavage by restriction endonuclease may occur between adjacent codons or it may occur within a codon.
More general methods for DNA cleavage or for end sequence modification are available. A variety of nonspecific endonucleases may be used to cleave DNA randomly, as discussed infra. End sequences may be modified by addition of random sequences of dA+dT or dG+dC, to create restriction sites without the need for specific linker sequences.
The term "expression" is used in recognition of the fact that an organism seldom if ever makes use of all its genetically endowed capabilities at any given time. Even in relatively simple organisms such as bacteria, many proteins which the cell is capable of synthesizing are not synthesized, although they may be synthesized under appropriate environmental conditions. When the protein product, coded by a given gene, is synthesized by the organism, the gene is said to be expressed. If the protein product is not made, the gene is not expressed. Normally, the expression of genes in E. coli is regulated as described generally, infra, in such manner that proteins whose function is not useful in a given environment are not synthesized and metabolic energy is conserved.
The means by which gene expression is controlled in E. coli is well understood, as the result of extensive studies over the past twenty years. See, generally, Hayes, W., The Genetics of Bacteria And Their Viruses, 2d edition, John Wiley and Sons, Inc., New York (1968), and Watson, J. D., The Molecular Biology of the Gene, 3d edition, Benjamin, Menlo Park, California (1976). These studies have revealed that several genes, usually those coding for proteins carrying out related functions in the cell, are found clustered together in continuous sequence. The cluster is called an operon. All genes in the operon are transcribed in the same direction, beginning with the codons coding for the N-terminal amino acid of the first protein in the sequence and continuing through to the C-terminal end of the last protein in the operon. At the beginning of the operon, proximal to the N-terminal amino acid codon, there exists a region of the DNA, termed the control region, which includes a variety of controlling elements including the operator, promoter and sequences for the ribosomal binding sites. The function of these sites is to permit the expression of those genes under their control to be responsive to the needs of the organism. For example, those genes coding for enzymes required exclusively for utilization of lactose are not expressed unless lactose or an analog thereof is actually present in the medium. The control region functions that must be present for expression to occur are the initiation of transcription and the initiation of translation. Expression of the first gene in the sequence is initiated by the initiation of transcription and translation at the position coding for the N-terminal amino acid of the first protein of the operon. The expression of each gene downstream from that point is also initiated in turn, at least until a termination signal or another operon is encountered with its own control region, keyed to respond to a different set of environmental cues. While there are many variations in detail on this general scheme, the important fact is that, to be expressed in a procaryote such as E. coli, a gene must be properly located with respect to a control region having initiator of transcription and initiator of translation functions.
It has been demonstrated that genes not normally part of a given operon can be inserted within the operon and controlled by it. The classic demonstration was made by Jacob, F., et al., J.Mol.Biol. 13, 704 (1965). In that experiment, genes coding for enzymes involved in a purine biosynthesis pathway are transferred to a region controlled by the lactose operon. The expression of the purine biosynthetic enzyme was then observed to be repressed in the absence of lactose or a lactose analog, and was rendered unresponsive to the environmental cues normally regulating its expression.
In addition to the operator region regulating the initiation of transcription of genes downstream from it, there are known to exist condons which function as stop signals, indicating the C-terminal end of a given protein. See Table 2. Such codons are known as termination signals and also nonsense codons, since they do not normally code for any amino acid. Deletion of a termination signal between structural genes of an operon creates a fused gene which could result in the synthesis of a chimeric protein consisting of two amino acid sequences coded by adjacent genes, joined by a peptide bond. That such chimeric proteins are synthesized when genes are fused was demonstrated by Benzer, S., and Champe, S. P., Proc.Nat.Acad.Sci USA 48, 14 (1962). p Once a given gene has been isolated, purified and inserted in a transfer vector, the over-all result of which is termed the cloning of the gene, its availability in substantial quantity is assured. The cloned gene is transferred to a suitable microorganism, wherein the gene replicates as the microorganism proliferates and from which the gene may be reisolated by conventional means. Thus is provided a continuously renewable source of the gene for further manipulations, modifications and transfers to other vectors or other loci within the same vector.
Expression is obtained by transferring the cloned gene, in proper orientation and reading frame, into a control region such that read-through from the procaryotic gene results in synthesis of a chimeric protein comprising the amino acid sequence coded by the cloned gene. A variety of specific protein cleavage techniques may be used to cleave the chimeric protein at a desired point so as to release the desired amino acid sequence, which may then be purified by conventional means. Techniques for constructing an expression transfer vector having the cloned gene in proper juxtaposition with a control region are described in Polisky, B., et al., Proc.Nat.Acad.Sci USA 73, 3900 (1976); Itakura, K., et al., Science 198, 1056 (1977); Villa-Komaroff, L., et al., Proc.Nat.Acad.Sci USA 75, 3727 (1978); Mercereau-Puijalon, O., et al, Nature 275, 505 (1978); Chang, A. C. Y., et al, Nature 275, 617 (1978), and in U.S. application Ser. No. 933,035 by Rutter, et al., said application incorporated herein by reference as though set forth in full.
In summary, the process whereby a mammalian protein, such as ACTH or endorphin, is produced with the aid of recombinant DNA technology first requires the cloning of the mammalian gene. Once cloned, the gene may be produced in quantity, further modified by chemical or enzymic means and transferred to an expression plasmid. The cloned gene is also useful for isolating related genes, or, where a fragment is cloned, for isolating the entire gene, by using the cloned gene as a hybridization probe. Further, the cloned gene is useful in proving by hybridization, the identity or homology of independent isolates of the same or related genes. Because of the nature of the genetic code, the cloned gene, when translated in the proper reading frame, will direct the production only of the amino acid sequence for which it codes and no other.
In the case of the cloned endorphin gene, its transposition to an expression transfer vector will permit the synthesis of endorphin by a host microorganism transformed with the vector carrying the cloned gene. Growth of the transferred host will result in synthesis of endorphin, under appropriate environmental conditions. Endorphin is useful as a morphine agonist and as an analgesic. There is a substantial degree of species cross-reactivity in the endorphins, with any interspecific differences being confined to that portion of the molecule outside the essential enkephalin sequence. The .beta.-endorphin of mouse is identical to that of sheep in amino acid sequence and differs from that of man only at position 28, where there is a His residue in mouse and a Tyr residue in the human sequence. The enkephalin moiety bearing the primary functional modality of mouse endorphin is the same as for man. Therefore, mouse endorphin differing only by a single amino acid substitution at position 28 from the human peptide will have essentially the same activity in man as the human peptide itself, perhaps differing slightly in such parameters as duration of action or dose response. Other sequence variants, either naturally-occurring, or man made, may be found, which can confer tissue specificity on the analgesic action of endorphins, thereby permitting relief of localized pain.
Similar considerations apply to other portions of the ACTH/LPH gene. ACTH, for example, is useful to regulate adrenal cortex output, and to prevent depression of adrenal cortex output during corticosteroid therapy. ACTH is widely used in a test for diagnostic evaluation of the hypophysispituitary axis, for example in the detection of Addison's disease. Use of ACTH is sometimes advocated where excessive inflammatory or immunological activity contributes to the symptoms, as in ulcerative colitis. Inter-species cross-reactivity is also observed with ACTH, so that non-human ACTH is useful in treatment of human disease as well as in veterinary medicine. The .alpha.- and .beta.-MSH moieties are active in the physiological tanning process. Administration of these hormones can produce darkening of the skin in a manner essentially indistinguishable from natural tanning. Interspecies cross-reactivity is observed with these hormones.
The foregoing considerations apply as well to the cloned gene for the entire ACTH/LPH precursor.