The human blood-forming (hematopoietic) system replaces a variety of white blood cells (including neutrophils, macrophages, and basophils/mast cells), red blood cells (erythrocytes) and clot-forming cells (megakaryocytes/platelets). The hematopoietic systems of the average male has been estimated to produce on the order of 4.5.times.10.sup.11 granulocytes and erythrocytes every year, which is equivalent to an annual replacement of total body weight (Dexter et al., BioEssays, 2;154-158, 1985).
It is believed that small amounts of certain hematopoietic growth factors account for the differentiation of a small number of progenitor "stem cells" into the variety of blood cell lines, for the tremendous proliferation of those lines, and for the ultimate differentiation of mature blood cells from those lines. Because the hematopoietic growth factors are present in extremely small amounts, the detection and identification of these factors has relied upon an array of assays which as yet only distinguish among the different factors on the basis of stimulative effects on cultured cells under artificial conditions.
U.S. Pat. No. 4,999,291 discloses DNA and methods for making G-CSF the disclosure of which is incorporated herein by reference in it entirety.
U.S. Pat. No. 4,810,643 relates to DNA and methods of making G-CSF and Cys to Ser substitution variants of G-CSF.
Kuga et al. (Biochem. + Biophys. Res. Comm. 159:103-111, 1989) made a series of G-CSF variants to partially define the structure-function relationship. Kuga et al. found that internal and C-terminal deletions abolished activity, while N-terminal deletions of up to 11 amino acids and amino acid substitutions at positions 1, 2 and 3 were active.
Watanabe et al. (Anal. Biochem. 195:38-44, 1991) made a variant to study G-CSF receptor binding in which amino acids 1 and 3 were changed to Tyr for radioiodination of the protein. Watanabe et al. found this Tyr.sup.1, Tyr.sup.3 G-CSF variant to be active.
WO 95/27732 describes, but does not show that the molecule has biological activity, a circularly permuted G-CSF ligand with a breakpoint at positions 68/69 creating a circularly permuted G-CSF ligand with a new N-terminus at the original position 69 of G-CSF and a new C-terminus at the original position 68 of G-CSF. WO 95/27732 also discloses circularly permuted GM-CSF, IL-2 and IL-4.
Rearrangement of Protein Sequences
In evolution, rearrangements of DNA sequences serve an important role in generating a diversity of protein structure and function. Gene duplication and exon shuffling provide an important mechanism to rapidly generate diversity and thereby provide organisms with a competitive advantage, especially since the basal mutation rate is low (Doolittle, Protein Science 1:191-200, 1992).
The development of recombinant DNA methods has made it possible to study the effects of sequence transposition on protein folding, structure and function. The approach used in creating new sequences resembles that of naturally occurring pairs of proteins that are related by linear reorganization of their amino acid sequences (Cunningham, et al., Proc. Natl. Acad. Sci. U.S.A. 76:3218-3222, 1979; Teather & Erfle, J. Bacteriol. 172: 3837-3841, 1990; Schimming et al., Eur. J. Biochem. 204: 13-19, 1992; Yamiuchi and Minamikawa, FEBS Lett. 260:127-130, 1991: MacGregor et al., FEBS Lett. 378:263-266, 1996). The first in vitro application of this type of rearrangement to proteins was described by Goldenberg and Creighton (J. Mol. Biol. 165:407-413, 1983). A new N-terminus is selected at an internal site (breakpoint) of the original sequence, the new sequence having the same order of amino acids as the original from the breakpoint until it reaches an amino acid that is at or near the original C-terminus. At this point the new sequence is joined, either directly or through an additional portion of sequence (linker), to an amino acid that is at or near the original N-terminus, and the new sequence continues with the same sequence as the original until it reaches a point that is at or near the amino acid that was N-terminal to the breakpoint site of the original sequence, this residue forming the new C-terminus of the chain.
This approach has been applied to proteins which range in size from 58 to 462 amino acids (Goldenberg & Creighton, J. Mol. Biol. 165:407-413, 1983; Li & Coffino, Mol. Cell. Biol. 13:2377-2383, 1993). The proteins examined have represented a broad range of structural classes, including proteins that contain predominantly .alpha.-helix (interleukin-4; Kreitman et al., Cytokine 7:311-318, 1995), .beta.-sheet (interleukin-1; Horlick et al., Protein Eng. 5:427-431, 1992), or mixtures of the two (yeast phosphoribosyl anthranilate isomerase; Luger et al., Science 243:206-210, 1989). Broad categories of protein function are represented in these sequence reorganization studies:
______________________________________ Enzymes T4 lysozyme Zhang et al., Biochemistry 32:12311-12318 (1993); Zhang et al., Nature Struct. Biol. 1:434-438 (1995) dihydrofolate Buchwalder et al., Biochemistry 31:1621-1630 reductase (1994); Protasova et al., Prot. Eng. 7:1373-1377 (1995) ribonuclease T1 Mullins et al., J. Am. Chem. Soc. 116:5529-5533 (1994); Garrett et al., Protein Science 5:204-211 (1996) Bacillus .beta.-glucanse Hahn et al., Proc. Natl. Acad. Sci. U.S.A. 91: 10417-10421 (1994) aspartate Yang & Schachman, Proc. Natl. Acad. Sci. U.S.A. transcarbamoylase 90:11980-11984 (1993) phosphoribosyl Luger et al., Science 243:206-210 (1989); Luger et anthranilate al., Prot. Eng. 3:249-258 (1990) isomerase pepsin/pepsinogen Lin et al., Protein Science 4:159-166 (1995) glyceraldehyde-3- Vignais et al., Protein Science 4:994-1000 (1995) phosphate dehydro- genase ornithine Li & Coffino, Mol. Cell. Biol. 13:2377-2383 (1993) decarboxylase yeast Ritco-Vonsovici et al., Biochemistry 34:16543- phosphoglycerate 16551 (1995) dehydrogenase Enzyme Inhibitor basic pancreatic Goldenberg & Creighton, J. Mol. Biol. 165:407-413 trypsin inhibitor (1983) Cytokines interleukin-1.beta. Horlick et al., Protein Eng. 5:427-431 (1992) interleukin-4 Kreitman et al., Cytokine 7:311-318 (1995) Tyrosine Kinase Recognition Domain .alpha.-spectrin SH3 Viguera, et al., J. Mol. Biol. 247:670-681 (1995) domain Transmembrane Protein omp A Koebnik & Kramer, J. Mol. Biol. 250:617-626 (1995) Chimeric Protein interleukin-4- Kreitman et al., Proc. Natl. Acad. Sci. U.S.A. 91: Pseudomonas 6889-6893 (1994). exotoxin fusion molecule ______________________________________
The results of these studies have been highly variable. In many cases substantially lower activity, solubility or thermodynamic stability were observed (E. coli dihydrofolate reductase, aspartate transcarbamoylase, phosphoribosyl anthranilate isomerase, glyceraldehyde-3-phosphate dehydrogenase, ornithine decarboxylase, omp A, yeast phosphoglycerate dehydrogenase). In other cases, the sequence rearranged protein appeared to have many nearly identical properties as its natural counterpart (basic pancreatic trypsin inhibitor, T4 lysozyme, ribonuclease Ti, Bacillus -.beta.glucanase, interleukin-1.beta. .alpha.-spectrin SH3 domain, pepsinogen, interleukin-4). In exceptional cases, an unexpected improvement over some properties of the natural sequence was observed, e.g., the solubility and refolding rate for rearranged .alpha.-spectrin SH3 domain sequences, and the receptor affinity and anti-tumor activity of transposed interleukin-4-Pseudomonas exotoxin fusion molecule (Kreitman et al., Proc. Natl. Acad. Sci. U.S.A. 91:6889-6893, 1994; Kreitman et al., Cancer Res. 55:3357-3363, 1995).
The primary motivation for these types of studies has been to study the role of short-range and long-range interactions in protein folding and stability. Sequence rearrangements of this type convert a subset of interactions that are long-range in the original sequence into short-range interactions in the new sequence, and vice versa. The fact that many of these sequence rearrangements are able to attain a conformation with at least some activity is persuasive evidence that protein folding occurs by multiple folding pathways (Viguera, et al., J. Mol. Biol. 247:670-681, 1995). In the case of the SH3 domain of .alpha.-spectrin, choosing new termini at locations that corresponded to .beta.-hairpin turns resulted in proteins with slightly less stability, but which were nevertheless able to fold.
The positions of the internal breakpoints used in the studies cited here are found exclusively on the surface of proteins, and are distributed throughout the linear sequence without any obvious bias towards the ends or the middle (the variation in the relative distance from the original N-terminus to the breakpoint is ca. 10 to 80% of the total sequence length). The linkers connecting the original N- and C-termini in these studies have ranged from 0 to 9 residues. In one case (Yang & Schachman, Proc. Natl. Acad. Sci. U.S.A. 90:11980-11984, 1993), a portion of sequence has been deleted from the original C-terminal segment, and the connection made from the truncated C-terminus to the original N-terminus. Flexible hydrophilic residues such as Gly and Ser are frequently used in the linkers. Viguera, et al. (J. Mol. Biol. 247:670-681, 1995) compared joining the original N- and C-termini with 3- or 4-residue linkers; the 3-residue linker was less thermodynamically stable. Protasova et al. (Protein Eng. 7:1373-1377, 1994) used 3- or 5-residue linkers in connecting the original N-termini of E. coli dihydrofolate reductase; only the 3-residue linker produced protein in good yield.