The invention herein provides for deoxynucleotide sequences coding for amino acid sequences which contain specific cleavage sites. The deoxynucleotide sequences are herein termed specific cleavage linkers and are useful in recombinant DNA technology.
Recent advances in biochemistry and in recombinant DNA technology have made it possible to achieve the synthesis of specific proteins under controlled conditions independent of the higher organism from which they are normally isolated. Such biochemical synthetic methods employ enzymes and subcellular components of the protein synthesizing machinery of living cells, either in vitro, in cell-free systems, or in vivo, in microorganisms. In either case, the key element is provision of a deoxyribonucleic acid (DNA) of specific sequence which contains the information necessary to specify the desired amino acid sequence. Such a specific DNA is herein termed a DNA coding segment. The coding relationship whereby a deoxynucleotide sequence is used to specify the amino acid sequence of a protein is described briefly, infra, and operates according to a fundamental set of principles that obtain throughout the whole of the known realm of living organisms.
A cloned DNA may be used to specify the amino acid sequence of proteins synthesized by in vitro systems. DNA-directed protein synthesizing systems are well-known in the art, see, e.g., Zubay, G., Ann. Rev. Genetics 7, 267 (1973). In addition, single-stranded DNA can be induced to act as messenger RNA in vitro, resulting in high fidelity translation of the DNA sequence (Salas, J. et al, J. Biol. Chem. 243, 1012 (1968). Other techniques well known in the art may be used in combination with the above procedures to enhance yields.
Developments in recombinant DNA technology have made it possible to isolate specific genes or portions thereof from higher organisms, such as man and other mammals, and to transfer the genes or fragments to a microorganism, such as bacteria or yeast. The transferred gene is replicated and propagated as the transformed microorganism replicates. As a result, the transformed microorganism may become endowed with the capacity to make whatever protein the gene or fragment encodes, whether it be an enzyme, a hormone, an antigen or an antibody, or a portion thereof. The microorganism passes on this capability to its progeny, so that in effect, the transfer has resulted in a new strain, having the described capability. See, for example, Ullrich, A. et al., Science 196, 1313 (1977), and Seeburg, P. H., et al., Nature 270, 486 (1977). A basic fact underlying the application of this technology for practical purposes is that DNA of all living organisms, from microbes to man, is chemically similar, being composed of the same four nucleotides. The significant differences lie in the sequences of these nucleotides in the polymeric DNA molecule. The nucleotide sequences are used to specify the amino acid sequences of proteins that comprise the organism. Although most of the proteins of different organisms differ from each other, the coding relationship between nucleotide sequence and amino acid sequence is fundamentally the same for all organisms. For example, the same nucleotide sequence which is the coding segment for the amino acid sequence of human growth hormone in human pituitary cells, will, when transferred to a microorganism, be recognized as coding for the same amino acid sequence.
Abbreviations used herein are given in Table 1.
TABLE 1 ______________________________________ DNA -- deoxyribonucleic acid A -- Adenine RNA -- ribonucleic acid T -- Thymine cDNA -- complementary DNA G -- Guanine (enzymatically synthesized C -- Cytosine from an mRNA sequence) U -- Uracil mRNA -- messenger RNA ATP -- adenosine triphosphate dATP -- deoxyadenosine triphos- TTP -- Thymidine phate triphosphate dGTP -- deoxyguanosine triphos- EDTA -- Ethylenediaminetetra- phate acetic acid dCTP -- deoxycytidine triphos- phate ______________________________________
The coding relationships between nucleotide sequence in DNA and amino acid sequence in protein are collectively known as the genetic code, shown in Table 2.
TABLE 2 ______________________________________ Genetic Code ______________________________________ Phenylalanine(Phe) TTK Histidine(His) CAK Leucine(Leu) XTY Glutamine(Gln) CAJ Isoleucine(Ile) ATM Asparagine(Asn) AAK Methionine(Met) ATG Lysine(Lys) AAJ Valine(Val) GTL Aspartic acid(Asp) GAK Serine(Ser) QRS Glutamic acid(Glu) GAJ Proline(Pro) CCL Cysteine(Cys) TGK Threonine(Thr) ACL Tryptophan(Try) TGG Alanine(Ala) GCL Arginine(Arg) WGZ Tyrosine(Tyr) TAK Glycine(Gly) GGL Termination signal TAJ Termination signal TGA ______________________________________
Key: Each 3-letter deoxynucleotide triplet corresponds to a trinucleotide of mRNA, having a 5'-end on the left and a 3'-end on the right. All DNA sequences given herein are those of the strand whose sequence corresponds to the mRNA sequence, with thymine substituted for uracil. The letters stand for the purine or pyrimidine bases forming the deoxynucleotide sequence.
______________________________________ A = adenine J = A or G G = guanine K = T or C C = cytosine L = A, T, C or G T = thymine M = A, C or T X = T or C if Y is A or G X = C if Y is C or T Y = A, G, C or T if X is C Y = A or G if X is T W = C or A if Z is A or G W = C if Z is C or T Z = A, G, C or T if W is C Z = A or G if W is A QR = TC if S is A, G, C or T QR = AG if S is T or C S = A, G, C or T if QR is TC S = T or C if QR is AG ______________________________________
An important feature of the code, for present purposes, is the fact that each amino acid is specified by a trinucleotide sequence, also known as a nucleotide triplet. The phosphodiester bonds joining adjacent triplets are chemically indistinguishable from all other internucleotide bonds in DNA. Therefore the nucleotide sequence cannot be read to code for a unique amino acid sequence without additional information to determine the reading frame, which is the term used to denote the grouping of triplets used by the cell in decoding the genetic message.
In procaryotic cells, the endogenous coding segments are typically preceded by nucleotide sequences having the functions of initiator of transcription (mRNA synthesis) and initiator of translation (protein synthesis), termed the promoter and ribosomal binding site, respectively. The coding segment begins around 3-11 nucleotides distant from the ribosomal binding site. The exact number of nucleotides intervening between the ribosomal binding site and the initiation codon of the coding segment does not appear to be critical for translation of the coding segment in correct reading frame. The term "expression control segment" is used herein to denote the nucleotide sequences comprising a promoter, ribosomal binding site and a 3-11 nucleotide spacer following the ribosomal binding site. In Eukaryotic cells, regulation of transcription and translation may be somewhat more complicated, but also involve such nucleotide sequences.
Many recombinant DNA techniques employ two classes of compounds, transfer vectors and restriction enzymes, to be discussed in turn. A transfer vector is a DNA molecule which contains, inter alia, genetic information which insures its own replication when transferred to a host microorganism strain. Examples of transfer vectors commonly used in bacterial genetics are plasmids and the DNA of certain bacteriophages. Although plasmids have been used as the transfer vectors for the work described herein, it will be understood that other types of transfer vectors may be employed. Plasmid is the term applied to any autonomously replicating DNA unit which might be found in a microbial cell, other than the genome of the host cell itself. A plasmid is not genetically linked to the chromosome of the host cell. Plasmid DNA's exist as double-stranded ring structures generally on the order of a few million daltons molecular weight, although some are greater than 10.sup.8 daltons in molecular weight. They usually represent only a small percent of the total DNA of the cell. Transfer vector DNA is usually separable from host cell DNA by virtue of the great difference in size between them. Transfer vectors carry genetic information enabling them to replicate within the host cell, in most cases independently of the rate of host cell division. Some plasmids have the property that their replication rate can be controlled by the investigator by variations in the growth conditions. By appropriate techniques, the plasmid DNA ring may be opened, a fragment of heterologous DNA inserted, and the ring reclosed, forming an enlarged molecule comprising the inserted DNA segment. Bacteriophage DNA may carry a segment of heterologous DNA inseted in place of certain non-essential phage genes. Either way, the transfer vector serves as a carrier or vector for an inserted fragment of heterologous DNA.
Transfer is accomplished by a process known as transformation. During transformation, host cells mixed with plasmid DNA incorporate entire plasmid molecules into the cells. Although the mechanics of the process remain obscure, it is possible to maximize the proportion of host cells capable of taking up plasmid DNA and hence of being transformed, by certain empirically determined treatments. Once a cell has incorporated a plasmid, the latter is replicated within the cell and the plasmid replicas are distributed to the daughter cells when the cell divides. Any genetic information contained in the nucleotide sequence of the plasmid DNA can, in principle, be expressed in the host cell. Typically, a transformed host cell is recognized by its acquisition of traits carried on the plasmid, such as resistance to certain antibiotics. Different plasmids are recognizable by the different capabilities or combination of capabilities which they confer upon the host cell containing them. Any given plasmid may be made in quantitiy by growing a pure culture of cells containing the plasmid and isolating the plasmid DNA therefrom.
Restriction endonucleases are hydrolytic enzymes capable of catalyzing site-specific cleavage of DNA molecules. The locus of restriction endonuclease action is determined by the existence of a specific nucleotide sequence. Such a sequence is termed the recognition site for the restriction endonuclease. Restriction endonucleases from a variety of sources have been isolated and characterized in terms of the nucleotide sequence of their recognition sites. Some restriction endonucleases hydrolyze the phosphodiester bonds on both strands at the same point, producing blunt ends. Others catalyze hydrolysis of bonds separated by a few nucleotides from each other, producing free single stranded regions at each end of the cleaved molecule. Such single stranded ends are self-complementary, hence cohesive, and may be used to rejoin the hydrolyzed DNA. Since any DNA susceptible of cleavage by such an enzyme must contain the same recognition site, the same cohesive ends will be produced, so that it is possible to join heterologous sequences of DNA which have been treated with a restriction endonuclease to other sequences similarly treated. See Roberts, R. J., Crit. Rev. Biochem. 4, 123 (1976). Restriction sites are relatively rare, however the general utility of restriction endonucleases has been greatly amplified by the chemical synthesis of double stranded oligonucleotides bearing the restriction site sequence. Therefore virtually any segment of DNA can be coupled to any other segment simply by attaching the appropriate restriction oligonucleotide to the ends of the molecule, and subjecting the product to the hydrolytic action of the appropriate restriction endonuclease, thereby producing the requisite cohesive ends. See Heynecker, H. L., et al., Nature 263, 748 (1976) and Scheller, R. H., et al., Science 196, 177 (1977). An important feature of the distribution of restriction endonuclease recognition sites is the fact that they are randomly distributed with respect to reading frame. Consequently, cleavage by restriction endonuclease may occur between adjacent codons or it may occur within a codon.
More general methods of DNA cleavage or for end sequence modification are available. A variety of non-specific endonucleases may be used to cleave DNA randomly, as discussed infra. End sequences may be modified by creation of oligonucleotide tails of dA on one end and dT at the other, or of dG and dC, to create sites for joining without the need for specific linker sequences.
The term "expression" is used in recognition of the fact that an organism seldom if ever makes use of all its genetically endowed capabilities at any given time. Even in relatively simple organisms such as bacteria, many proteins which the cell is capable of synthesizing are not synthesized, although they may be synthesized under appropriate environmental conditions. When the protein product, coded by a given gene, is synthesized by the organism, the gene is said to be expressed. If the protein product is not made, the gene is not expressed. Normally, the expression of genes in E. coli is regulated as described generally, infra, in such manner that proteins whose function is not useful in a given environment are not synthesized and metabolic energy is conserved.
The means by which gene expression is controlled in E. coli and yeast is well understood, as the result of extensive studies over the past twenty years. See, generally, Hayes, W., The Genetics of Bacteria And Their Viruses, 2d edition, John Wiley & Sons, Inc., New York (1968), and Watson, J. D., The Molecular Biology of the Gene, 3d edition, Benjamin, Menlo Park, Calif. (1976). These studies have revealed that several genes, usually those coding for proteins carrying out related functions in the cell, may be found clustered together in continuous sequence. The cluster is called an operon. All genes in the operon are transcribed in the same direction, beginning with the codons coding for the N-terminal amino acid of the first protein in the sequence and continuing through to the C-terminal end of the last protein in the operon. At the beginning of the operon, proximal to the N-terminal amino acid codon, there exists a region of the DNA, termed the control region, which includes a variety of controlling elements including the operator, promoter and sequences for the binding of ribosomes. The function of these sites is to permit the expression of those genes under their control to be responsive to the needs of the organism. For example, those genes coding for enzymes required exclusively for utilization of lactose are normally not appreciably expressed unless lactose or an analog thereof is actually present in the medium. The control region functions that must be present for expression to occur are the initiation of transcription and the initiation of translation. The minimal requirements for independent expression of a coding segment are therefore a promoter, a ribosomal binding site, and a 3-11 nucleotide spacer segment. The nucleotide sequences contributing these functions are relatively short, such that the major portion of an expression control segment might be on the order of 15 to 25 nucleotides in length. Expression of the first gene in the sequence is initiated by the initiation of transcription and translation at the position coding for the N-terminal amino acid of the first protein of the operon. The expression of each gene downstream from that point is also initiated in turn, at least until a termination signal or another operon is encountered with its own control region, keyed to respond to a different set of environmental cues. While there are many variations in detail on this general scheme, the important fact is that to be expressed in a host such as E. coli. or a eukaryotic such as yeast a gene must be properly located with respect to a control region having initiator of transcription and initiator of translation functions.
It has been demonstrated that genes not normally part of a given operon can be inserted within the operon and controlled by it. The classic demonstration was made by Jacob, F., et al., J. Mol. Biol. 13, 704 (1965). In that experiment, genes coding for enzymes involved in a purine biosynthesis pathway were transferred to a region controlled by the lactose operon. The expression of the purine biosynthetic enzyme was then observed to be repressed in the absence of lactose or a lactose analog, and was rendered unresponsive to the environmental cues normally regulating its expression.
In addition to the operator region regulating the initiation of transcription of genes downstream from it, there are known to exist codons which function as stop signals, indicating the C-terminal end of a given protein. See Table 2. Such codons are known as termination signals and also as nonsense condons, since they do not normally code for any amino acid. Deletion of a termination signal between structural genes of an operon creates a fused gene which could result in the synthesis of a chimeric or fusion protein consisting of two amino acid sequences coded by adjacent genes, joined by a peptide bond. That such chimeric proteins are synthesized when genes are fused was demonstrated by Benzer, S., and Champe, S. P., Proc. Nat. Acad. Sci USA 48, 114 (1962).
Once a given gene has been isolated, purified and inserted in a transfer vector, the over-all result of which is termed the cloning of the gene, its availability in substantial quantity is assured. The cloned gene is transferred to a suitable microorganism, wherein the gene replicates as the microorganism proliferates and from which the gene may be reisolated by conventional means. Thus is provided a continuously renewable source of the gene for further manipulations, modifications and transfers to other vectors or other loci within the same vector.
Expression has been obtained in the prior art by transferring the cloned gene, in proper orientation and reading frame, into a control region such that read-through from the host gene results in synthesis of a chimeric protein comprising the amino acid sequence coded by the cloned gene. Techniques for constructing an expression transfer vector having the cloned gene in proper juxtaposition with a control region are described in Polisky, B., et al., Proc. Nat. Acad. Sci USA 73, 3900 (1976); Itakura, K., et al., Science 198, 1056 (1977); Villa-Komaroff, L., et al., Proc. Nat. Acad. Sci USA 75, 3727 (1978); Mercereau-Puijalon, O., et al., Nature 275, 505 (1978); Chang, A. C. Y., et al., Nature 275, 617 (1978), and in copending U.S. application Ser. No. 933,035 by Rutter, et al., filed Aug. 11, 1978, said application incorporated herein by reference as though set forth in full.
As described in Ser. No. 933,035, the cloned gene is joined to a host control fragment in order to obtain expression of the gene. This control fragment may consist of no more than that part of the control region providing for initiation of transcription and initiation of translation, or may additionally include a portion of a structural gene, depending on the location of the insertion site. Thus, the expression product would be either a protein coded by the cloned gene, hereinafter referred to as a non-fusion protein, or a fusion protein coded in part by the procaryotic structural gene, in part by the cloned gene, and in part by any intervening nucleotide sequences linking the two genes. The peptide bond between the desired protein or peptide, comprising the C-terminal portion of the fusion protein, and the remainder, is herein termed the "junction bond".
After the protein has been produced, it must then be purified. Several advantages and disadvantages exist for the purification of either the non-fusion protein or the fusion protein. The non-fusion protein is produced within the cell. As a consequence, the cells must be lysed or otherwise treated in order to release the non-fusion protein. The lysate will contain all of the proteins of the cell in addition to the non-fusion protein, which may make purification of the protein difficult. Another consequence is that the non-fusion protein may be recognized as a foreign protein and undergo rapid degradation within the cell. Therefore non-fusion proteins may not be obtainable in reasonable yields. A major advantage of a non-fusion protein is that the protein itself is the desired final product.
The stability of the expression product is frequently enhanced by expression of a fusion protein. The host portion of the fusion protein frequently stabilizes the expression product against intracellular degradation. Further, it is often possible to choose a host protein which is protected from degradation by compartmentalization or by excretion from the cell into the growth medium. The cloned gene can then be attached to the host gene for such a protein. A fusion protein consisting of an excreted or compartmentalized host protein (N-terminal) and an eucaryotic protein (C-terminal), is likely to be similarly excreted from the cell or compartmentalized within it because the signal sequence of amino acids that confers secretability is on the N-terminal portion of the fusion protein. In the case pf a fusion protein excreted into the cell medium, purification is greatly simplified. In some instances, the host portion may have distinctive physical properties that permit the use of simple purification procedures. A major disadvantage of the fusion protein is that the host protein must be removed from the fusion protein in order for the eucaryotic protein to be obtained.
Direct expression as a non-fusion protein will generally be preferred if the protein is stable in the host cell. In many instances, the disadvantage of having to purify the expression product from a cell lysate will be overcome by the advantage of not having to employ specific cleavage means to remove an N-terminal portion. Most advantageously, as provided herein by the present invention, the desired protein may be expressed as a fusion protein comprising an N-terminal sequence having distinctive physical properties useful for purification and provided with a structure at the junction point with the desired C-terminal portion such that the junction bond, as defined supra, can be cleaved by means which do not appreciably affect the desired C-terminal protein or peptide.
Many methods for chemical cleavage of peptides have been proposed and tested. Spande, T. F., et al, Adv. Protein Chem. 24, 97 (1970). However, many of these are non-specific, i.e. they cleave at many sites in a protein. See also a brief discussion in The Proteins, 3rd Ed., Neurath, H. and Hill, R. L., Ed., Academic Press, Vol. 3, pp. 50-57 (1977). Hydrolysis of peptide bonds is catalyzed by a variety of known proteolytic enzymes. See The Enzymes, 3rd Ed., Boyer, P. D., Ed., Academic Press, Vol. III (1971); Methods in Enzymology, Vol. XIX, Perlmann, G. E. and Lorand, L. Ed., Academic Press (1970); and, Methods in Enzymology, Vol. XLV, Lorand L., Ed., Academic Press (1976). However, many proteolytic enzymes are also non-specific, with respect to the cleavage site.
The specificity of each chemical or enzymatic means for cleavage is generally described in terms of amino acid residues at or near the hydrolyzed peptide bond. The hydrolysis of a peptide bond in a protein or polypeptide is herein termed a cleavage of the protein or polypeptide at the site of the hydrolyzed bond. The peptide bonds which are hydrolyzed by chemical or enzymatic means are generally known. (See the above-identified references.) For example, trypsin (3.4.4.4) cleaves on the carboxyl side of an arginine or lysine residue. (The number in parentheses after the enzyme is its specific identifying nomenclature as established by the International Union of Biochemists.) Thus, trypsin is said to be specific for arginine or lysine. Since trypsin hydrolyzes only on the carboxyl side of arginine or lysine residues, it is said to have a narrow specificity. Pepsin (3.4.4.1), on the other hand, has a broad specificity and will cleave on the carboxyl side of most amino acids but preferably phenylalanine, tyrosine, tryptophan, cysteine, cystine or leucine residues. A few specific chemical cleavage reactions are known. For example, CNBr will cleave only at methionine residues under appropriate conditions. However, the difficulty with all specific cleavage means, whether chemical or enzymatic, which depend upon the existence of a single amino acid residue at or near the cleavage point is that such methods will only be useful in specific instances where it is known that no such residue occurs internally in the amino acid sequence of the desired protein. The larger the desired protein, the greater the likelihood that the sensitive residue will occur internally. Therefore, a technique generally useful for cleaving fusion proteins at a desired point is preferally based upon the existence of a sequence of amino acids at the junction bond which has a low likelihood of occurrence internally in the desired protein.
The specificity for the site of the hydrolyzed peptide bond is generally termed the primary specificity of the enzyme. Thus, trypsin has a primary specificity for arginine and lysine residues. The primary specificity of enzymes has been the subject of considerable investigation. It has determined that a particular enzyme would recognize and bind the amino acid residue within a protein molecule corresponding to the enzyme's primary specificity and cleave the protein at that point. The part of an enzyme which recognizes and binds the substrate and catalyzes the reaction is known as the active site. For example, trypsin would recognize and bind an arginine residue within a protein and cleave the protein on the carboxyl side of the arginine. For many years it was thought that only the amino acid residues corresponding to the primary specificity affected the specificity of hydrolysis of the peptide bond by the enzyme. However, it has been noted that amino acids in the immediate vicinity of the site of hydrolysis may affect the binding affinity of the enzyme at that site. Several examples of this effect can be shown for trypsin. Considering the sequence --x--Arg--y where x and y are amino acids, it has been found that the binding affinity of trypsin at the Arg--y bond is significantly reduced when x is Glu or Asp. Similarly, it has been shown that the binding affinity at an arginine or lysine residue, in repetitive sequences of lysine, arginine or combination thereof, is greater than if a single arginine or lysine residue were present. That is, the enzyme preferentially binds at --Arg--Arg--X compared to y--Arg--x. Also, trypsin does not appear to hydrolyze the --Arg--Pro-- or --Lys--Pro peptide bind. See Kasper, C. B., at p. 137 in Protein Sequence Determination, Needleman, S. B., Ed. Springer-Verlag, New York (1970).
Recently, it has also been determined that amino acids in the vicinity of the site of hydrolysis will also be recognized and bound by the enzyme. For example, Schechter, I. et al., Biochem. Biophys. Res. Comm., 27, 157 (1967) reported that papain (3.3.4.10) binds several amino acid residues in its active site as determined from the hydrolysis of peptides of various lengths. An active site which binds several amino acids is often termed an extended active site. The specificity of an enzyme for the additional amino acids not at the immediate site of hydrolysis is sometimes termed the secondary specificity of the enzyme. It has now been shown that many enzymes have extended active sites. Several additional examples of enzymes having extended active sites include: elastase (3.4.4.7)--Thompson, R.C., et al., Proc. Nat. Acad. Sci. USA 67, 1734 (1970); .alpha.-chymotrypsin (3.4.4.5)--Bauer, C. A., et al., Biochem. 15, 1291 and 1296 (1976); chymosin (3.4.23.4)--Visser, S., et al., Biochem. Biophys. Acta 438 265 (1976); and enterokinase (3.4.4.8 )--Maroux, S., et al., J. Biol. Chem. 246, 5031 (1971). (See also Fruton, J. S., Cold Spring Harbor Conf. Cell Prolif. 2, 33 (1975).) The extended active site appears to at least increase the catalytic efficiency of the enzyme. It may also increase the binding affinity of the enzyme for the peptide. See Fruton, J. S., supra. For example, Schechter, I. et al., Biochem. Biophys. Res. Comm. 32, 898 (1969) found that the phenylalanine in the sequence --x--Phe--y--z where x, y and z are amino acids enhances the susceptibility of the peptide to hydrolysis by papain and directs the enzymatic attack at the y--z peptide bond. Valine and leucine may also provide similar results when substituted for Phe in the above sequence. This could be an explanation for the broad specificity of papain. See Glazer, A. N. et al at p. 501 in The Enzymes, supra. Thus, an enzyme may have a narrow specificity as a result of its primary specificity alone or in combination with its secondary specificity (i.e., the enzyme has an extended active site).
The present invention provides for the prokaryotic or eukaryotic expression of a cloned coding segment such that the desired protein is produced, either as a fusion protein or a non-fusion protein, as desired, and may be provided with specific additional amino acid sequences to permit specific cleavage at the junction bond of a fusion protein and to permit rapid purification. The general invention provides a number of options for the investigator, depending on the size and function of the desired protein, and upon the relative advantages of expression as a fusion or non-fusion protein, according to principles well known in the art, as discussed supra.
To provide generally useful means for specific cleavage of the junction bond, a chemical or enzymatic cleavage means having a narrow specificity will not be suitable except in special cases. A cleavage means is not suitable if its cleavage site occurs within the eucaryotic protein of the fusion protein. For example, a eucaryotic protein may contain several arginine and/or lysine residues. Trypsin would cleave on the carboxyl side of these residues. Since cleavage would occur within the eucaryotic protein, trypsin would not be suitable for use for the present invention. This is also true for many chemical cleavage means. Thus, it can be seen that in order to obtain more specific cleavage, it may be necessary to utilize a cleavage means which will have a cleavage site in a specific amino acid sequence having two or more amino acid residues. For example, it would be desirable for the cleavage means to be specific for an amino acid sequence --x--y--z-- and to cleave on the carboxyl side of the z residue. The probability of a similar sequence occurring within the eucaryotic protein would be very small. Therefore, the probability of cleavage within the eucaryotic protein would also be very small. The entire eucaryotic protein can then be removed and purified.
The present invention is designed such that, when a fusion protein is expressed, a specific cleavage sequence of one or more amino acids is inserted between the host portion and the eucaryotic portion of the fusion protein. If the sequence of the eucaryotic portion is known, it is possible to select a specific cleavage sequence of only one amino acid residue so long as that residue does not appear in the eucaryotic protein. It is preferred, however, to utilize a specific cleavage sequence which contains two or more amino acid residues sometimes referred to herein as an extended specific cleavage sequence. This type of sequence takes advantage of the extended active sites of various enzymes. By utilizing an extended specific cleavage sequence, it is highly probable that cleavage will only occur at the desired site, the junction bond, and not within the desired protein. The present invention is important in recombinant DNA technology. By inserting a specifically recognized amino acid sequence between the host protein portion and the desired portion of a fusion protein, it is now possible to specifically cleave the desired portion out of the fusion protein without further affecting the desired portion.
For practical purposes, as contemplated by the present invention, the specificity of cleavage at the junction need not be all or nothing with respect to other potential cleavage sites in the desired protein. It suffices if the junction bond cleavage site is sufficiently favored kinetically, either due to increased binding affinity or enhanced turnover time, that the junction bond is cleaved preferentially with respect to other sites, such that a reasonable yield of the desired protein can be obtained. Reaction conditions of temperature, buffer, ratio of enzyme to substrate, reaction time and the like can be selected so as to maximize the yield of the desired protein, as a matter of ordinary skill in the art.
One enzyme which may cleave at a specific cleavage site has been called a signal peptidase. For several eucaryotic and procaryotic proteins, the initial translation product is not the protein itself, but the protein with approximately 20 additional amino acids on the amino terminus of the protein. The additional amino acid sequence is called a signal peptide. The signal peptide is thought to be a specific signal for the vectorial transport of the synthesized protein into the endoplasmic reticulum and is cleaved away from the protein during this phase. See Blobel, G. et al, J. Cell Biol. 67, 835 (1975). A specific cleavage enzyme, i.e., signal peptidase, has been observed in a cell-free system which hydrolyzes the peptide bond between the signal peptide and the active protein in association with passage through a cell membrane. See Blobel, G. et al, Proc. Nat. Acad. Sci. USA 75, 361 (1978).
The present invention provides for the synthesis of a specific cleavage linker which can be attached to the end of the isolated DNA segment coding for the N-terminus of the protein prior to insertion of the segment into the transfer vector. The specific cleavage linker codes for an amino acid sequence which contains a specific cleavage site which does not occur within the desired protein. Thus, the specific cleavage within the linker amino acid sequence results in the isolation of the desired protein from the fusion protein. An advantage of the present invention is the cleavage at the amino-terminal side of the first amino acid of the N-terminus of the desired protein. Another advantage is that little of the desired protein is degraded during the cleavage procedure.
For the purpose of providing expression as a non-fusion protein, the present invention provides synthetic oligonucleotide linkers comprising a promoter, a ribosomal binding site, and a 3-11 nucleotide spacer. This linker, coupled with a coding segment, provides for direct expression of the coding segment when inserted into a transfer vector and used to transform a suitable host. Using such a linker, the coding segment may be expressed even though inserted in a "silent" region of the vector, thus increasing the range of choice of suitable insertion sites. Preferably, direct expression of the coding segment is obtained without resortin to a synthetic promoter segment. A ribosomal binding site linker, together with a 3-11 nucleotide spacer, directs the reinitiation of translation of mRNA initiated at a naturally occurring promoter site. Therefore, as long as the coding segment and expression linker are inserted in a transfer vector gene under natually occurring promoter control, reinitiation at the inserted ribosomal binding site results in direct expression of the attached coding segment. Most preferably, the insertion is made adjacent to the existing promoter, between it and the structural gene it normally controls.
For the purpose of improving purification of the fusion or non-fusion protein, the present invention provides a linker coding for amino acid sequences which function to enhance ease of purification. For example, a polyanionic amino acid segment or a polycationic or hydrophobic segment will be tightly bound by a variety of known solid phase adsorbents or column materials. Specific amino acid sequences recognizable by specific binding substances can be incorporated on either end of the desired protein to render it purifiable by affinity chromatography. Such purification segments can be used in conjunction with a specific cleavage segment to provide for simple quantitative purification of fusion or non-fusion proteins followed by specific cleavage of the purification segment and quantitative removal thereof.
The foregoing purposes are achieved in the present invention according to the properties of each system, to solve the individual problems presented in preparing the desired protein. The principles of the present invention as discussed herein provide generally applicable means for expressing a coding segment as a fusion protein, or a non-fusion protein, with or without a purification segment, specifically cleavable from any protein or peptide not part of the desired expression product.