Colony stimulating factor-1 (CSF-1) is one of several proteins which are capable of stimulating colony formation by bone marrow cells plated in semisolid culture medium. CSF-1 is distinguished from other colony stimulating factors by virtue of its ability to stimulate these cells to become predominantly macrophage colonies. Other CSFs stimulate the production of colonies which consist of neutrophilic granulocytes and macrophages; predominantly neutrophilic granulocytes; or neutrophilic and eosinophilic granulocytes and macrophages. A review of these CSFs has been published by Dexter, T. M., Nature (1984) 309:746, and by Vadas, M. A., J Immunol (1983) 130:793. There is currently no routine in vivo assay which is known to be specific for CSF-1 activity.
The characteristics of native human CSF-1 are complex, and in fact it is not yet clear what form of CSF-1 is active in the human body. Soluble forms of naturally-produced CSF-1 have been purified to various degrees from human urine, mouse L-cells, cultured human pancreatic carcinoma (MIA PaCa) cells, and also from various human and mouse lung cell conditioned media, from human T-lymphoblast cells, and from human placental-conditioned medium. Many, if not all of the isolated native CSF-1 proteins appear to be glycosylated dimers, regardless of source. There is considerable variety in the molecular weights exhibited by the monomeric components of CSF-1, apparently the result of variations in C-terminal processing and/or the extent of glycosylation. For example, Western analysis shows that the CSF-1 secreted by the MIA PaCa cell line contains reduced monomers of approximately 26 and 30 kd, as well as 40, 48, and 70 kd forms. Other CSF-1 molecular weights have been reported. For example, the monomeric reduced form of CSF-1 isolated from human urine is reported to be of the relatively low molecular weight of 25 kd when isolated, and 14-17 kd when extensively deglycosylated in vitro (Das, S. and Stanley, E. R., J Biol Chem (1982) 257:13679).
The existence of "native-like" CSF-1 reference proteins is important because these proteins provide standards against which to compare the quality and biological activity of refolded recombinant forms of CSF-1. For this purpose, we have relied upon the soluble CSF-1 produced by the Mia PaCa cell line as well as properties of other highly purified CSF-1 molecules which have been described in the literature. The specific activity of these purified "native-like" reference proteins has typically fallen in the range of 4 to 10.times.10.sup.7 units per mg (as measured by in vitro mouse bone marrow colony-forming assays).
CSF-1 has also been produced from recombinant DNA using two apparently related cDNA clones: (1) a "short" form which encodes a message which, when translated, produces a monomeric protein of 224 amino acids preceded by a 32-amino acid signal sequence (Kawasaki, E. S., et al, Science (1985) 230:292-296, and PCT WO86/04607, both of which are incorporated herein by reference); and (2) a "long" form, encoding a monomeric protein of 522 amino acids, also preceded by the 32-amino acid signal sequence. The long form has been cloned and expressed by two groups, as disclosed in Ladner, M. B., et al, The EMBO J (1987) 6(9):2693-2698, and Wong, G., et al, Science (1987) 235:1504-1509, both of which are incorporated herein by reference. (The DNA and amino acid sequences for both "short" and "long" forms are shown in FIGS. 5 and 6, respectively; however, the 32 amino acid signal sequence is incomplete as illustrated in FIG. 6.)
The long and short forms of the CSF-1-encoding DNA appear to arise from a variable splice junction at the upstream portion of exon 6 of the genomic CSF-1-encoding DNA. When CSF-1 is expressed in certain eucaryotic cells from either the long or short cDNA forms, it appears to be variably processed at the C-terminus and/or variably glycosylated. Consequently, CSF-1 proteins of varying molecular weights are found when the reduced monomeric form is analyzed by Western analysis.
The amino acid sequences of the long and short forms, as predicted from the DNA sequence of the isolated clones and by their relationship to the genomic sequence, are identical with respect to the first 149 amino acids at the N-terminus of the mature protein, and diverge thereafter by virtue of the inclusion in the longer clone of an additional 894 bp insert encoding 298 additional amino acids following glutamine 149. Both the shorter and longer forms of the gene allow expression of proteins with sequences containing identical regions at the C-terminus, as well as at the N-terminus. Biologically active CSF-1 has been recovered when cDNA encoding through the first 150 or 158 amino acids of the short form, or through the first 221 amino acids of the longer form, is expressed in eucaryotic cells.
Since most, if not all, of the native secreted CSF-1 molecules are glycosylated and dimeric, significant posttranslational processing apparently occurs in vivo. Given the complexity of the native CSF-1 molecule, it has been considered expedient to express the CSF-1 gene in cells derived from higher organisms. It seemed unlikely that active protein would be obtained when the gene was expressed in more convenient bacterial hosts, such as E. coli. Bacterial hosts do not have the capacity to glycosylate proteins, nor are their intracellular conditions conducive to the refolding, disulfide bond formation, and disulfide-stabilized dimerization which is apparently essential for full CSF-1 activity. Thus, experimental production of recombinant CSF-1 in E. coli has, prior to this invention, resulted in protein of very low activity, although its identification as monomeric CSF-1 had been readily confirmed by immunoassay, N-terminal sequencing, and amino acid analysis.
It is by now accepted that inactive forms of recombinant foreign proteins produced in bacteria may require further "refolding" steps in order to render them useful for the purposes for which they are intended. As a dimeric protein containing a large number of cysteines and disulfide bonds, which are required for activity, CSF-1 represents a particularly difficult challenge for production from bacterial systems. Often, recombinant proteins produced in E. coli, including CSF-1 so produced, are in the form of highly insoluble intracellular protein precipitates referred to as inclusion bodies or refractile bodies. These inclusions can readily be separated from the soluble bacterial proteins, but then must be solubilized under conditions which result in essentially complete denaturation of the protein. Even secreted proteins from bacterial sources, while not necessarily presenting the same solubility problems, may require considerable manipulation in order to restore activity. Each different protein may require a different refolding protocol in order to achieve full biological activity.
A number of papers have appeared which report refolding attempts for individual proteins produced in bacterial hosts, or which are otherwise in denatured or non-native form. A representative sample follows.
Reformation of an oligomeric enzyme after denaturation by sodium dodecyl sulfate (SDS) was reported by Weber, K., et al, J Biol Chem (1971) 246:4504-4509. This procedure was considered to solve a problem created by the binding of proteins to SDS, and the process employed removal of the denatured protein from SDS in the presence of 6M urea, along with anion exchange to remove the SDS, followed by dilution from urea, all in the presence of reducing agents. The proteins which were at least partially refolded included: aspartate transcarbamylase, B-galactosidase, rabbit muscle aldolase, and coat protein from bacteriophage R-17.
Light, A., in Biotechnicues (1985) 3:298-306, describes a variety of attempts to refold a large number of proteins. It is apparent from the description in this reference that the techniques which are applicable are highly individual to the particular protein concerned. In fact, in some cases, refolding significant amounts of particular proteins has not been possible and the results are quite unpredictable. In addition, refolding procedures for recombinant urokinase produced in E. coli were described in Winkler, M. E., Biotechnology (1985) 3:990-999. In this case, the material was dissolved in 8M urea or 5M guanidine hydrochloride, and the rearrangement of disulfides was facilitated by use of a buffer containing a glutathione redox system. Recombinant human immune interferon, which has no disulfide bonds, has been refolded to generate a more active preparation using chaotropic agents in the absence of thiol-disulfide exchange reagents (PCT application WO 86/06385). In another example, bacterially synthesized granulocyte macrophage colony-stimulating factor (GM-CSF), a member of the CSF group, was also produced in E. coli and refolded after solubilization in 6M urea. This CSF is unrelated to CSF-1, since GM-CSF has a distinct amino acid sequence and is also monomeric.
Use of refolding procedures to obtain reconstitution of activity in multimeric proteins has also been described by Herman, R. H., et al, Biochemistry (1985) 24:1817-1821, for phosphoglycerate mutase, and by Cabilly, S., Proc Natl Acad Sci USA (1984) 81:3273-3277, for immunoglobulins. An additional procedure for immunoglobulin reassembly was described by Boss, M. A., et al, Nucleic Acids Research (1984) 12:3791-3806. These procedures all employ denaturation and the use of appropriate oxidizing and reducing agents or sulfitolysis reagents. A related approach employs the catalyst thioredoxin, and is disclosed by Pigiet, V. P., Proc Natl Acad Sci USA (1986) 83:7643-7647.
Certain aspects of solubilization, purification, and refolding of certain recombinant proteins produced as refractile bodies in bacteria are also disclosed in U.S. Pat. Nos. 4,511,502; 4,511,503; 4,512,922; 4,518,526 and EPO publication 114,506 (Genentech).
The foregoing references are merely representative of a large body of literature which, when taken together, shows individual steps in protocols which may be modified and combined in various sequences to obtain individually tailored procedures for particular subject proteins produced in accordance with particular expression systems. It is evident that retailoring of the overall procedures to fit a specific case is a requirement for producing refolded product with full biological activity in useful amounts.
For example, a number of the published procedures describe a step for successful refolding of the recombinantly produced protein. It is not clear from these references, but is known in the art, that the starting material for refolding may exist in a variety of forms, depending on the nature of the expression system used. In the case of bacterial expression, it is, however, clear that the product is not glycosylated, and that, in addition, production of an intracellular disulfide-bonded dimeric product is essentially prevented by the reducing environment in bacterial cells.
Currently the most common form of recombinant protein starting material for refolding is an intracellular, insoluble protein which is produced by expression of a gene for mature or bacterial fusion protein, lacking a functional signal sequence, under the control of standard bacterial promoters such as TRP or P.sub.L. Because recombinantly produced products in bacteria are produced in high concentrations in a reducing environment, and because typically the constructs do not enable the bacteria to secrete the recombinant protein, these foreign proteins are often observed to form insoluble inclusion bodies.
However, signal sequences which function in bacteria are known, including the E. coli penicillinase sequence disclosed by Gilbert et al, U.S. Pat. Nos. 4,411,994 and 4,338,397, the B. licheniformis penP sequences disclosed by Chang in U.S. Patent Nos. 4,711,843 and 4,711,844, and the phosphatase A signal sequence (phoA) disclosed by Chang, et al, in European Patent Publication No. 196,864, published 8 Oct. 1986, and incorporated herein by reference. Secretion can be effected in some strains. However, if Gram-negative hosts are used, complete secretion may not occur, and the protein may reside in the periplasmic space. Nevertheless, it is much more likely that proteins expressed under control of promoters and signal sequences such as phoA will be produced in soluble form if they are capable of refolding and forming required disulfide bonds in the extracellular environment. The methods disclosed hereinbelow are expected to be of value for both intracellular and secreted products where refolding is required.
Nowhere in the literature is a specific process described for the preparation of biologically active dimeric CSF-1 from bacteria. The present invention describes several refolding procedures involving CSF-1 proteins of various primary structures. The resulting refolded CSF-1 proteins are fully active and soluble, and the various molecules differ sufficiently in physical properties that they may be expected to exhibit a variety of pharmacokinetic and/or pharmacological properties when used therapeutically in vivo.