The present invention relates to genetically engineered therapeutic proteins. More specifically, the engineered proteins include growth hormone and related proteins.
The following proteins are encoded by genes of the growth hormone (GH) supergene family (Bazan (1990); Mott and Campbell (1995); Silvennoinen and Ihle (1996)): growth hormone, prolactin, placental lactogen, erythropoietin (EPO), thrombopoietin (TPO), interleukin-2 (IL-2), IL-3, IL-4, IL-5, IL-6, IL-7, IL-9, IL-10, IL-11, IL-12 (p35 subunit), IL-13, IL-15, oncostatin M, ciliary neurotrophic factor, leukemia inhibitory factor, alpha interferon, beta interferon, garma interferon, omega interferon, tau interferon, granulocyte-colony stimulating factor (G-CSF), granulocyte-macrophage colony stimulating factor (GM-CSF), macrophage colony stimulating factor (M-CSF) and cardiotrophin-1 (CT-1) (xe2x80x9cthe GH supergene familyxe2x80x9d). It is anticipated that additional members of this gene family will be identified in the future through gene cloning and sequencing. Members of the GH supergene family have similar secondary and tertiary structures, despite the fact that they generally have limited amino acid or DNA sequence identity. The shared structural features allow new members of the gene family to be readily identified.
There is considerable interest on the part of patients and healthcare providers in the development of long acting, xe2x80x9cuser-friendlyxe2x80x9d protein therapeutics. Proteins are expensive to manufacture and, unlike conventional small molecule drugs, are not readily absorbed by the body. Moreover, they are digested if taken orally. Therefore, natural proteins must be administered by injection. After injection, most proteins are cleared rapidly from the body, necessitating frequent, often daily, injections. Patients dislike injections, which leads to reduced compliance and reduced drug efficacy. Some proteins, such as erythropoietin (EPO), are effective when administered less often (three times per week for EPO) because they are glycosylated. However, glycosylated proteins are produced using expensive mammalian cell expression systems.
The length of time an injected protein remains in the body is finite and is determined by, e.g., the protein""s size and whether or not the protein contains covalent modifications such as glycosylation. Circulating concentrations of injected proteins change constantly, often by several orders of magnitude, over a 24-hour period. Rapidly changing concentrations of protein agonists can have dramatic downstream consequences, at times under-stimulating and at other times over-stimulating target cells. Similar problems plague protein antagonists. These fluctuations can lead to decreased efficacy and increased frequency of adverse side effects for protein therapeutics. The rapid clearance of recombinant proteins from the body significantly increases the amount of protein required per patient and dramatically increases the cost of treatment. The cost of human protein pharmaceuticals is expected to increase dramatically in the years ahead as new and existing drugs are approved for more disease indications.
Thus, there is a need to develop protein delivery technologies that lower the costs of protein therapeutics to patients and healthcare providers. The present invention provides a solution to this problem by providing methods to prolong the circulating half-lives of protein therapeutics in the body so that the proteins do not have to be injected frequently. This solution also satisfies the needs and desires of patients for protein therapeutics that are xe2x80x9cuser-friendlyxe2x80x9d, i.e., protein therapeutics that do not require frequent injections. The present invention solves these and other problems by providing biologically active, cysteine-added variants of members of the growth hormone supergene family. The invention also provides for the chemical modification of these variants with cysteine-reactive polymers or other types of cysteine-reactive moieties to produce derivatives thereof and the molecules so produced.
The present invention provides cysteine variants of members of the GH supergene family. The variants comprise a cysteine residue substituted for a nonessential amino acid of the proteins. Preferably, the variants comprise a cysteine residue substituted for an amino acid selected from amino acids in the loop regions, the ends of the alpha helices, proximal to the first amphipathic helix, and distal to the final amphipathic helix or wherein the cysteine residue is added at the N-terminus or C-terminus of the proteins. Preferred sites for substitution are the N- and O-linked glycosylation sites.
Also provided are cysteine variants wherein the amino acid substituted for is in the A-B loop, B-C loop, the C-D loop or D-E loop of interferon/interferon-10-like members of the GH supergene family.
Also provided are cysteine variants of members of the GH supergene family wherein the cysteine residue is introduced between two amino acids in the natural protein. In particular, the cysteine residue is introduced into the loop regions, the ends of the alpha helices, proximal to the first amphipathic helix, or distal to the final amphipathic helix. Even more particularly, the cysteine variant is introduced between two amino acids in an Nxe2x80x94O-linked glycosylation site or adjacent to an amino acid in an N-linked or O-linked glycosylation site.
More particularly are provided cysteine variants wherein the loop region where the cysteine is introduced is the A-B loop, the B-C loop, the C-D loop or D-E loop of interferon/interferon-10-like members of the GH supergene family.
Such cysteine substitutions or insertion mutations also can include the insertion of one or more additional amino acids amino acids at the arnino-terminal or carboxy-terminal to the cysteine substitution or insertion.
Also provided are cysteine variants that are further derivatised by PEGylating the cysteine variants and including the derivatised proteins produced thereby.
As set forth in the examples, specific cysteine variants of the members of the GH supergene family also are provided, including for example, variants of GH. The GH cysteine variants can have the substituted-for amino acid or inserted cysteine located at the N-terminal end of the A-B loop, the B-C loop, the C-D loop, the first three or last three amino acids in the A, B, C and D helices and the amino acids proximal to helix A and distal to helix D.
More particularly, the cysteine can be substituted for the following amino acids: F1, T3, P5, E33, A34, K38, E39, K40, S43, Q46, N47, P48, Q49, T50, S51, S55, T60, A98, N99, S100, G104, A105, S,106, E129, D130, G131, S132, P133, T135, G136, Q137, Q137, K140, Q141, T142, S144, K145, D147, T148, N149, S150, H151, N152, D153, S184, E186, G187, S188, and G190.
Other examples of cysteine variants according to the invention include erythropoietin variants. Erythropoietin variants include those wherein the substituted for amino acid is located in the A-B loop, the B-C loop, the C-D loop, the amino acids proximal to helix A and distal to helix D and the N- or C-terminus. Even more specifically, the EPO cysteine variants include molecules wherein the amino acids indicated below have a cysteine substituted therefor: serine-126, N24, I25, T26, N38, I39, T40, N83, S84, A1, P2, P3, R4, D8, S9, T27, G28, A30, E31, H32, S34, N36, D43, T44, K45, N47, A50, K52, E55, G57, Q58, G77, Q78, A79, Q86, W88, E89, T107, R110, A111, G113, A114, Q115, K116, E117, A118, S120, P121, P122, D123, A124, A125, A127, A128, T132, K154, T157, G158, E159, A160, T163, G164, D165, R166 and S85.
The members of the GH supergene family include growth hormone, prolactin, placental lactogen, erythropoietin, thrombopoietin, interleukin-2, interleukin-3, interleukin-4, interleukin-5, interleukin-6, interleukin-7, interleukin-9, interleukin-10, interleukin-11, interleukin-12 (p35 subunit), interleukin-13, interleukin-15, oncostatin M, ciliary neurotrophic factor, leukemia inhibitory factor, alpha interferon, beta interferon, gamma interferon, omega interferon, tau interferon, granulocyte-colony stimulating factor, granulocyte-macrophage colony stimulating factor, macrophage colony stimulating factor, cardiotrophin-1 and other proteins identified and classified as members of the family. The proteins can be derived from any animal species including human, companion animals and farrn animals.
Other variations and modifications to the invention will be obvious to those skilled in the art based on the specification and the xe2x80x9crulesxe2x80x9d set forth herein. All of these are considered as part of the invention.
The present invention relates to cysteine variants and, among other things, the site-specific conjugation of such proteins with polyethylene glycol (PEG) or other such moieties. PEG is a non-antigenic, inert polymer that significantly prolongs the length of time a protein circulates in the body. This allows the protein to be effective for a longer period of time. Covalent modification of proteins with PEG has proven to be a useful method to extend the circulating half-lives of proteins in the body (Abuchowski et al., 1984; Hershfield, 1987; Meyers et al., 1991). Covalent attachment of PEG to a protein increases the protein""s effective size and reduces its rate of clearance rate from the body. PEGs are conmmercially available in several sizes, allowing the circulating half-lives of PEG-modified proteins to be tailored for individual indications through use of different size PEGs. Other benefits of PEG modification include an increase in protein solubility, an increase in in vivo protein stability and a decrease in protein immunogenicity (Katre et al., 1987; Katre, 1990).
The preferred method for PEGylating proteins is to covalently attach PEG to cysteine residues using cysteine-reactive PEGs. A number of highly specific, cysteine-reactive PEGs with different reactive groups (e.g., maleimide, vinylsulfone) and different size PEGs (2-20 kDa) are commercially available (e.g., from Shearwater, Polymers, Inc., Huntsville, Ala.). At neutral pH, these PEG reagents selectively attach to xe2x80x9cfreexe2x80x9d cysteine residues, i.e., cysteine residues not involved in disulfide bonds. The conjugates are hydrolytically stable. Use of cysteine-reactive PEGs allows the development of homogeneous PEG-protein conjugates of defined structure.
Considerable progress has been made in recent years in determining the structures of commercially important protein therapeutics and understanding how they interact with their protein targets, e.g., cell-surface receptors, proteases, etc. This structural information can be used to design PEG-protein conjugates using cysteine-reactive PEGs. Cysteine residues in most proteins participate in disulfide bonds and are not available for PEGylation using cysteine-reactive PEGs. Through in vitro mutagenesis using recombinant DNA techniques, additional cysteine residues can be introduced anywhere into the protein. The added cysteines can be introduced at the beginning of the protein, at the end of the protein, between two amino acids in the protein sequence or, preferably, substituted for an existing amino acid in the protein sequence. The newly added xe2x80x9cfreexe2x80x9d cysteines can serve as sites for the specific attachment of a PEG molecule using cysteine-reactive PEGs. The added cysteine must be exposed on the protein""s surface and accessible for PEGylation for this method to be successful. If the site used to introduce an added cysteine site is non-essential for biological activity, then the PEGylated protein will display essentially wild type (normal) in vitro bioactivity. The major technical challenge in PEGylating proteins with cysteine-reactive PEGs is the identification of surface exposed, non-essential regions in the target protein where cysteine residues can be added or substituted for existing amino acids without loss of bioactivity.
Cysteine-added variants of a few human proteins and PEG-polymer conjugates of these proteins have been described. U.S. Pat. No. 5,206,344 describes cysteine-added variants of IL-2. These cysteine-added variants are located within the first 20 amino acids from the amino terminus of the mature IL-2 polypeptide chain. The preferred cysteine variant is at position 3 of the mature polypeptide chain, which corresponds to a threonine residue that is O-glycosylated in the naturally occurring protein. Substitution of cysteine for threonine at position 3 yields an IL-2 variant that can be PEGylated with a cysteine-reactive PEG and retain full in vitro bioactivity (Goodson and Katre, 1990). In contrast, natural IL-2 PEGylated with lysine-reactive PEGs displays reduced in vitro bioactivity (Goodson and Katre, 1990). The effects of cysteine substitutions at other positions in IL-2 were not reported.
U.S. Pat. No. 5,166,322 teaches cysteine-added variants of IL-3. These variants are located within the first 14 amino acids from the N-terminus of the mature protein sequence. The patent teaches expression of the proteins in bacteria and covalent modification of the proteins with cysteine-reactive PEGs. No information is provided as to whether the cysteine-added variants and PEG-conjugates of IL-3 are biologically active. Cysteine-added variants at other positions in the polypeptide chain were not reported.
World patent application WO9412219 and PCT application US95/06540 teach cysteine-added variants of insulin-like growth factor-I (IGF-I). IGF-I has a very different structure from GH and is not a member of the GH supergene family (Mott and Campbell, 1995). Cysteine substitutions at many positions in the IGF-I protein are described. Only certain of the cysteine-added variants are biologically active. The preferred site for the cysteine added variant is at amino acid position 69 in the mature protein chain. Cysteine substitutions at positions near the N-terminus of the protein (residues 1-3) yielded IGF-I variants with reduced biological activities and improper disulfide bonds.
World patent application WO9422466 teaches two cysteine-added variants of insulin-like growth factor (IGF) binding protein-1, which has a very different structure than GH and is not a member of the GH supergene family. The two cysteine-added IGF binding protein-1 variants disclosed are located at positions 98 and 101 in the mature protein chain and correspond to serine residues that are phosphorylated in the naturally-occurring protein.
U.S. patent application Ser. No. 07/822,296 teaches cysteine added variants of tumor necrosis factor binding protein, which is a soluble, truncated form of the tumor necrosis factor cellular receptor. Tumor necrosis factor binding protein has a very different structure than GH and is not a member of the GH supergene family.
IGF-I, IGF binding protein-1 and tumor necrosis factor binding protein have secondary and tertiary structures that are very different from GH and the proteins are not members of the GH supergene family. Because of this, it is difficult to use the information gained from studies of IGF-I, IGF binding protein-1 and tumor necrosis factor binding protein to create cysteine-added variants of members of the GH supergene family. The studies with IL-2 and IL-3 were carried out before the structures of IL-2 and IL-3 were known (McKay 1992; Bazan, 1992) and before it was known that these proteins are members of the GH supergene family. Previous experiments aimed at identifying preferred sites for adding cysteine residues to IL-2 and IL-3 were largely empirical and were performed prior to experiments indicating that members of the GH supergene family possessed similar secondary and tertiary structures.
Based on the structural information now available for members of the GH supergene family, the present invention provides xe2x80x9crulesxe2x80x9d for determining a priori which regions and amino acid residues in members of the GH supergene family can be used to introduce or substitute cysteine residues without significant loss of biological activity. In contrast to the naturally occurring proteins, these cysteine-added variants of members of the GH supergenc family will possess novel properties such as the ability to be covalently modified at defined sites within the polypeptide chain with cysteine-reactive polymers or other types of cysteine-reactive moieties. The covalently modified proteins will be biologically active.
GH is the best-studied member of the GH supergene family. GH is a 22 kDa protein secreted by the pituitary gland. GH stimulates metabolism of bone, cartilage and muscle and is the body""s primary hormone for stimulating somatic growth during childhood. Recombinant human GH (rhGH) is used to treat short stature resulting from GH inadequacy and renal failure in children. GH is not glycosylated and can be produced in a fully active form in bacteria. The protein has a short in vivo half-life and must be administered by daily subcutaneous injection for maximum effectiveness (MacGillivray et al., 1996). Recombinant human GH (rhGH) was approved recently for treating cachexia in AIDS patients and is under study for treating cachexia associated with other diseases.
The sequence of human GH is well known (see, e.g., Martial et al. 1979; Goeddel et al. 1979 which are incorporated herein by reference; SEQ ID NO:1). GH is closely related in sequence to prolactin and placental lactogen and these three proteins were considered originally to comprise a small gene family. The primary sequence of GH is highly conserved among animal species (Abdel-Meguid et al., 1987), consistent with the protein""s broad species cross-reactivity. The three dimensional folding pattern of porcine GH has been solved by X-ray crystallography (Abdel-Meguid et al., 1987). The protein has a compact globular structure, comprising four amphipathic alpha helical bundles joined by loops. Human GH has a similar structure (de Vos et al., 1992). The four alpha helical regions are termed A-D beginning from the N-terminus of the protein. The loop regions are referred to by the helical regions they join, e.g., the A-B loop joins helical bundles A and B. The A-B and C-D loops are long, whereas the B-C loop is short. GH contains four cysteine residues, all of which participate in disulfide bonds. The disulfide assignments are cysteine53 joined to cysteine 165 and cysteine 182 joined to cysteine 189.
Abdel-Meguid et al., 1987, ibid., (see FIG. 3 of this publication) provide an alignment of amino acid sequences of growth hormone from different species and the position of the four xcex1-helices relative to these sequences. As can be seen in FIG. 3 of Abdel-Meguid et al., the positions of the four helical regions and the intervening (loop) regions, as well as the regions preceding Helix A and following Helix D, are located within the sequence for human growth hormone at the following positions (positions given relative to SEQ ID NO:1):
Preceding Helix A=residues 1-5
Helix A=residues 6-33
A-B loop=residues 34-74
Helix B=residues 75-96
B-C loop=residues 97-105
Helix C=residues 106-129
C-D loop=residues 130-153
Helix D=residues 154-183
Following Helix D=residues 184-191.
The crystal structure of GH bound to its receptor revealed that GH has two receptor binding sites and binds two receptor molecules (Cunningham et al., 1991; de Vos et al., 1992). The two receptor binding sites are referred to as site I and site II. Site I encompasses the Carboxy (C)-termninal end of helix D and parts of helix A and the A-B loop, whereas site II encompasses the Amino (N)-terninal region of helix A and a portion of helix C. Binding of GH to its receptor occurs sequentially, with site I always binding first. Site II then engages a second GH receptor, resulting in receptor dimerization and activation of the intracellular signaling pathways that lead to cellular responses to GH. A GH mutein in which site II has been mutated (a glycine to arginine mutation at amino acid 120) is able to bind a single GH receptor, but is unable to dimerize GH receptors; this mutein acts as a GH antagonist in vitro, presumably by occupying GH receptor sites without activating intracellular signaling pathways (Fuh et al., 1992).
The roles of particular regions and amino acids in GH receptor binding and intracellular signaling also have been studied using techniques such as mutagenesis, monoclonal antibodies and proteolytic digestion. The first mutagenesis experiments entailed replacing entire domains of GH with similar regions of the closely related protein, prolactin (Cunningham et al., 1989). One finding was that replacement of the B-C loop of GH with that of prolactin did not affect binding of the hybrid GH protein to a soluble form of the human GH receptor, implying that the B-C loop was non-essential for receptor binding. Alanine scanning mutagenesis (replacement of individual amino acids with alanine) identified 14 amino acids that are critical for GH bioactivity (Cunningham and Wells, 1989). These amino acids are located in the helices A, B, C, and D and the A-B loop and correspond to sites I and II identified from the structural studies. Two lysine residues at amino acid positions 41 and 172, K41 and K172, were determined to be critical components of the site I receptor binding site, which explains the decrease in bioactivity observed when K172 is acetylated (The and Chapman, 1988). Modification of K 168 also significantly reduced GH receptor binding and bioactivity (de la Llosa et al., 1985; Martal et al., 1985; The and Chapman, 1988). Regions of GH responsible for binding the GH receptor have also been studied using monoclonal antibodies (Cunningham et al., 1989). A series of eight monoclonal antibodies was generated to human GH and analyzed for the ability to neutralize GH activity and prevent binding of GH to its recombinant soluble receptor. The latter studies allowed the putative binding site for each monoclonal antibody to be localized within the GH three-dimensional structure. Of interest was that monoclonal antibodies 1 and 8 were unable to displace GH from binding its receptor. The binding sites for these monoclonal antibodies were localized to the B-C loop (monoclonal number 1) and the N-terminal end of the A-B loop (monoclonal number 8). No monoclonals were studied that bound the C-D loop specifically. The monoclonal antibody studies suggest that the B-C loop and N-terninal end of the A-B loop are non-essential for receptor binding. Finally, limited cleavage of GH with trypsin was found to produce a two chain derivative that retained full activity (Mills et al., 1980; Li, 1982). Mapping studies indicated that trypsin cleaved and/or deleted amino acids between positions 134 and 149, which corresponds to the C-D loop. These studies suggest the C-D loop is not involved in receptor binding or GH bioactivity.
Structures of a number of cytokines, including G-CSF (Hill et al., 1993), GM-CSF (Diederichs et al., 1991; Walter et al., 1992), IL-2 (Bazan, 1992; McKay, 1992), IL-4 (Redfield et al., 1991; Powers et al., 1992), and IL-5 (Milburn et al., 1993) have been determined by X-ray diffraction and NMR studies and show striking conservation with the GH structure, despite a lack of significant primary sequence homology. EPO is considered to be a member of this family based upon modeling and mutagenesis studies (Boissel et al., 1993; Wen et al., 1994). A large number of additional cytokines and growth factors including ciliary neurotrophic factor (CNTF), leukemia inhibitory factor (LIF), thrombopoietin (TPO), oncostatin M, macrophage colony stimulating factor (M-CSF), IL-3, IL-6, IL-7, IL-9, IL-12, IL-13, IL-15, and alpha, beta, omega, tau and garnma interferon belong to this family (reviewed in Mott and Campbell, 1995; Silvennoinen and Ihle 1996). All of the above cytokines and growth factors are now considered to comprise one large gene family, of which GH is the prototype.
In addition to sharing similar secondary and tertiary structures, members of this family share the property that they must oligomerize cell surface receptors to activate intracellular signaling pathways. Some GH family members, e.g.; GH and EPO, bind a single type of receptor and cause it to form homodimers. Other family members, e.g., IL-2, IL4. and IL-6, bind more than one type of receptor and cause the receptors to form heterodimers or higher order aggregates (Davis et al., 1993; Paonessa et al., 1995; Mott and Campbell, 1995). Mutagenesis studies have shown that, like GH, these other cytokines and growth factors contain multiple receptor binding sites, typically two, and bind their cognate receptors sequentially (Mott and Campbell, 1995; Matthews et al., 1996). Like GH, the primary receptor binding sites for these other family members occur primarily in the four alpha helices and the A-B loop (reviewed in Mort and Campbell, 1995). The specific amino acids in the helical bundles that participate in receptor binding differ amongst the family members (Mott and Campbell, 1995). Most of the cell surface receptors that interact with members of the GH supergene family are structurally related and comprise a second large multi-gene family (Bazan, 1990; Mott and Campbell, 1995; Silvennoinen and Ihle 1996).
A general conclusion reached from mutational studies of various members of the GH supergene family is that the loops joining the alpha helices generally tend to not be involved in receptor binding. In particular the short B-C loop appears to be non-essential for receptor binding in most, if not all, family members. For this reason, the B-C loop is a preferred region for introducing cysteine substitutions in members of the GH supergene family. The A-B loop, the B-C loop, the C-D loop (and D-E loop of interferon/ IL-10-like members of the GH superfamily) also are preferred sites for introducing cysteine mutations. Amino acids proximal to helix A and distal to the final helix also tend not to be involved in receptor binding and also are preferred sites for introducing cysteine substitutions. Certain members of the GH family, e.g., EPO, IL-2, IL-3, IL-4, IL-6, G-CSF, GM-CSF, TPO, IL-10, IL-12 p35, IL-13, IL-15 and beta-interferon contain N-linked and O-linked sugars. The glycosylation sites in the proteins occur almost exclusively in the loop regions and not in the alpha helical bundles. Because the loop regions generally are not involved in receptor binding and because they are sites for the covalent attachment of sugar groups, they are preferred sites for introducing cysteine substitutions into the proteins. Amino acids that comprise the N- and O-linked glycosylation sites in the proteins are preferred sites for cysteine substitutions because these amino acids are surface-exposed, the natural protein can tolerate bulky sugar groups attached to the proteins at these sites and the glycosylation sites tend to be located away from the receptor binding sites.
Many additional members of the GH gene family are likely to be discovered in the future. New members of the GH supergene family can be identified through computer-aided secondary and tertiary structure analyses of the predicted protein sequences. Members of the GH supergene family will possess four or five amphipathic helices joined by non-helical amino acids (the loop regions). The proteins may contain a hydrophobic signal sequence at their N-terminus to promote secretion from the cell. Such later discovered members of the GH supergen family also are included within this invention.
The present invention provides xe2x80x9crulesxe2x80x9d for creating biologically active cysteine-added variants of members of the GH supergene family. These xe2x80x9crulesxe2x80x9d can be applied to any existing or future member of the GH supergene family. The cysteine-added variants will posses novel properties not shared by the naturally occurring proteins. Most importantly, the cysteine added variants will possess the property that they can be covalently modified with cysteine-reactive polymers or other types of cysteine-reactive moieties to generate biologically active proteins with improved properties such as increased in vivo half-life, increased solubility and improved in vivo efficacy.
Specifically, the present invention provides biologically active cysteine variants of members of the GH supergene family by substituting cysteine residues for non-essential amino acids in the proteins. Preferably, the cysteine residues are substituted for amino acids that comprise the loop regions, for amino acids near the ends of the alpha helices and for amino acids proximal to the first amphipathic helix or distal to the final amphipathic helix of these proteins. Other preferred sites for adding cysteine residues are at the N-terminus or C-terminus of the proteins. Cysteine residues also can be introduced between two amino acids in the disclosed regions of the polypeptide chain. The present invention teaches that N- and O-linked glycosylation sites in the proteins are preferred sites for introducing cysteine substitutions either by substitution for amino acids that make up the sites or, in the case of N-linked sites, introduction of cysteines therein. The glycosylation sites can be serine or threonine residues that are O-glycosylated or asparagine residues that are N-glycosylated. N-linked glycosylation sites have the general structure asparagine-X-serine or threonine (N-X-S/T), where X can be any amino acid. The asparagine residue, the amino acid in the X position and the serine/threonine residue of the N-linked glycosylation site are preferred sites for creating biologically active cysteine-added variants of these proteins. Amino acids immediately surrounding or adjacent to the O-linked and N-linked glycosylation sites (within about 10 residues on either side of the glycosylation site) are preferred sites for introducing cysteine-substitutions.
More generally, certain of the xe2x80x9crulesxe2x80x9d for identifying preferred sites for creating biologically active cysteine-added protein variants can be applied to any protein, not just proteins that are members of the GH supergene family. Specifically, preferred sites for creating biologically active cysteine variants of proteins (other than IL-2) are O-linked glycosylation sites. Amino acids immediately surrounding the O-linked glycosylation site (within about 10 residues on either side of the glycosylation site) also are preferred sites. N-linked glycosylation sites, and the amino acid residues immediately adjacent on either side of the glycosylation site (within about 10 residues of the N-X-S/T site) also are preferred sites for creating cysteine added protein variants. Amino acids that can be replaced with cysteine without significant loss of biological activity also are preferred sites for creating cysteine-added protein variants. Such non-essential amino acids can be identified by performing cysteine-scanning mutagenesis on the target protein and measuring effects on biological activity. Cysteine-scanning mutagenesis entails adding or substituting cysteine residues for individual amino acids in the polypeptide chain and determining the effect of the cysteine substitution on biological activity. Cysteine scanning mutagenesis is similar to alanine-scanning mutagenesis (Cunningham et al., 1992), except that target amino acids are individually replaced with cysteine rather than alanine residues.
Application of the xe2x80x9crulesxe2x80x9d to create cysteine-added variants and conjugates of protein antagonists also is contemplated. Excess production of cytokines and growth factors has been implicated in the pathology of many inflammatory conditions such as rheumatoid arthritis, asthma, allergies and wound scarring. Excess production of GH has been implicated as a cause of acromegaly. Certain growth factors and cytokines, e.g., GH and IL-6, have been implicated in proliferation of particular cancers. Many of the growth factors and cytokines implicated in inflammation and cancer are members of the GH supergene family. There is considerable interest in developing protein antagonists of these molecules to treat these diseases. One strategy involves engineering the cytokines and growth factors so that they can bind to, but not oligomerize receptors. This is accomplished by mutagenizing the second receptor binding site (site II) on the molecules. The resulting muteins are able to bind and occupy receptor sites but are incapable of activating intracellular signaling pathways. This strategy has been successfully applied to GH to make a GH antagonist (Cunningham et al., 1992). Similar strategies are being pursued to develop antagonists of other members of the GH supergene family such as IL-2 (Zurawski et al., 1990; Zurawski and Zurawski, 1992), IL-4 (Kruse et al., 1992), IL-5 (Tavernier et al., 1995), GM-CSF (Hercus et al., 1994) and EPO (Matthews et al., 1996). Since the preferred sites for adding cysteine residues to members of the GH supergene family described here lie outside of the receptor binding sites in these proteins, and thus removed from any sites used to create protein antagonists, the cysteine-added variants described herein could be used to generate long-acting versions of protein antagonists. As an example, Cunningham et al. (1992) developed an in vitro GH antagonist by mutating a glycine residue (amino acid 120) to an arginine. This glycine residue is a critical component of the second receptor binding site in GH; when it is replaced with arginine, GH cannot dimerize receptors. The glycine to arginine mutation at position 120 can be introduced into DNA sequences encoding the cysteine-added variants of GH contemplated herein to create a cysteine-added GH antagonist that can be conjugated with cysteine-reactive PEGs or other types of cysteine-reactive moieties. Similarly, amino acid changes in other proteins that turn the proteins from agonists to antagonists could be incorporated into DNA sequences encoding cysteine-added protein variants described herein. Considerable effort is being spent to identify amino acid changes that convert protein agonists to antagonists. Hercus et al.(1994) reported that substituting arginine or lysine for glutamic acid at position 21 in the mature GM-CSF protein converts GM-CSF from an agonist to an antagonist. Tavernier et al.(1995) reported that substituting glutamine for glutamic acid at position 13 of mature IL-5 creates an IL-5 antagonist.
Experimental strategies similar to those described above can be used to create cysteine-added variants (both agonists and antagonists) of members of the GH supergene family derived from various animals. This is possible because the primary amino acid sequences and structures of cytokines and growth factors are largely conserved between human and animal species. For this reason, the xe2x80x9crulesxe2x80x9d disclosed herein for creating biologically active cysteine-added variants of members of the GH supergene family will be useful for creating biologically active cysteine-added variants of members of the GH supergene family of companion animals (e.g., dogs, cats, horses) and commercial animal (e.g., cow, sheep, pig) species. Conjugation of these cysteine-added variants with cysteine-reactive PEGs will create long-acting versions of these proteins that will benefit the companion animal and commercial farm animal markets.
Proteins that are members of the GH supergene family (hematopoietic cytokines) are provided in Silvennoimem and Ihle (1996). Silvennoimem and Ihle (1996) also provide information about the structure and expression of these proteins. DNA sequences, encoded amino acids and in vitro and in vivo bioassays for the proteins described herein are described in Aggarwal and Gutterman (1992; 1996), Aggarwal (1998), and Silvennoimem and Ihle (1996). Bioassays for the proteins also are provided in catalogues of various conunercial suppliers of these proteins such as RandD Systems, Inc. and Endogen, Inc.