Field This invention relates to newly identified polynucleotides, polypeptides encoded by such polynucleotides, the use of such polynucleotides and polypeptides, as well as the production of such polynucleotides and polypeptides. More particularly, the polypeptide of the present invention has been identified as a member of the Kunitz serine proteinase inhibitor family and is hereinafter referred to as BTL.010.
Inflammatory Diseases
The inflammatory response after surgeries, trauma and infection involves neutrophil activation and infiltration into the injured tissue. The activated neutrophils release the neutral serine proteinases leukocyte elastase, cathepsin G and proteinase 3, which, if not properly controlled, cause abnormal connective tissue turnover and result in severe damage to healthy tissue (1-3, 81). The uncontrolled proteolysis can lead to a myriad of diseases including emphysema, idiopathic pulmonary fibrosis, adult respiratory distress syndrome, cystic fibrosis, rheumatoid arthritis, organ failure, and glomerulonephritis.
Proteins capable of inhibiting the neutral serine proteinases released by neutrophils can have therapeutic efficacy in treating inflammatory diseases. In patients suffering from hyperdynamic septic shock, plasma levels of the serine proteinase inhibitors antithrombin III, alpha 2-macroglobulin and inter-alpha-trypsin inhibitor, as well as those of various clotting, complement and other plasma factors, are significantly decreased (5). In an experimental endotoxemia model, the reduction in the plasma levels of these factors was considerably diminished by the intravenous injection of a soybean-derived leukocyte elastase and cathepsin G inhibitor, indicating that these neutral proteinases are at least partially responsible for the proteolysis of the plasma factors. In addition, the survival rate in the rat lethal peritonitis model (cecal ligation and puncture-induced septic shock model) was improved by treatment with the second domain of human urinary trypsin inhibitor (2), which has been shown to inhibit leukocyte elastase and cathepsin G (6, 7).
Stimulated neutrophils generate active oxygen species which contribute to inflammatory diseases, necrosis of surrounding tissues, mutagenicity and carcinogenicity (8). The most effective serine protease inhibitors in decreasing H2O2 formation by TPA-activated neutrophils were chymotrypsin-specific inhibitors (e.g., potato inhibitor-1 and a chymotrypsin-inhibitory fragment of potato inhibitor-2), followed by bifunctional inhibitors recognizing both chymotrypsin and trypsin, and least active was soybean trypsin inhibitor, a predominantly trypsin inhibitor. In addition, cytin, a chymotrypsin- but not trypsin-specific inhibitor, significantly diminished the level of human neutrophil and monocyte activation induced by lipopolysaccharide (9).
Neutrophil chemotaxis also plays an important role in the inflammatory response and, when excessive or persistent, may augment tissue damage (10). Inhibitors of cathepsin G and chymotrypsin suppressed neutrophil chemotaxis to the chemoattractants N-formyl-L-methionyl-L-leucyl-L-phenylalanine (fMLP) and zymosan-activated serum in multiple blind well assays and to fMLP in migration assays under agarose.
IL-1, a proinflammatory cytokine, is secreted from monocytes at inflammatory sites as an inactive precursor. Leukocyte elastase and cathepsin G cleave the IL-1 precursor to form fully active forms of IL-1 (11). Synovial fluid collected from patients with inflammatory polyarthritis and bronchoalveolar lavage fluid from patients with sarcoidosis process the IL-1 precursor into the same active forms as leukocyte elastase and cathepsin G. Control fluids from patients who had no symptoms of inflammatory disease did not exhibit the processing activity. Only lavage fluids that processed precursor IL-1 contain cathepsin G and/or elastase activity.
Synthetic tannin exhibits anti-inflammatory properties in skin diseases. Tannin specifically inhibits leukocyte elastase in an irreversible manner, and it is believed that the anti-inflammatory properties of synthetic tannin may at least in part be due to inactivation of elastase (12).
Lung Injury
Many syndromes of lung injury, including emphysema, adult respiratory distress syndrome, cystic fibrosis and idiopathic pulmonary fibrosis, are associated with accumulation of neutrophils within the pulmonary parenchyma. Activated neutrophils have the capacity to produce lung injury by secreting products including proteinases and reactive oxygen species (13). Neutral serine proteinases secreted from activated neutrophils are capable of inducing damage to lung alveolar extracellular matrix (ECM) by directly digesting the matrix and through the activation of latent metalloproteases resident in the matrix (14). Proteinase 3 and leukocyte elastase have been shown to cause significant lung damage and emphysema when administered by tracheal insufflation or injection to hamsters (16, 17).
Inhibitors of neutrophil neutral serine proteinases have been shown to exert potent therapeutic effects on pulmonary emphysema, adult respiratory distress syndrome and other diseases involving tissue degradation. Treatment of hamsters with Eglin C, a neutral serine proteinase inhibitor, completely protected hamsters against leukocyte elastase-induced emphysema (18). Derivatives of 5-methyl-4H-3,1-benzoxazin-4-one, shown to be highly specific inhibitors of leukocyte elastase, efficiently prevented degradation of insoluble elastin by stimulated neutrophils (19). These small molecule inhibitors also significantly suppressed leukocyte induced pulmonary hemorrhage and emphysema in hamsters (19). Alpha 1-proteinase inhibitor and soybean trypsin inhibitor, two leukocyte elastase and cathepsin G inhibitors, were also shown to completely or nearly completely inhibit neutrophil-induced ECM solubilization (13).
However, alpha 1-proteinase inhibitor, the major endogenous serine proteinase inhibitor for neutrophil elastase, is easily inactivated by proteolysis by metalloproteinases present in the injured lung and by oxidation (20, 21). Oxidative inactivation of alpha 1-proteinase has been linked to the pathogenesis of pulmonary emphysema associated with cigarette smoking (22).
Vascular Effects
Injury to the vascular endothelium, such as that occurs during angioplasty, can result in the accumulation of neutrophils and platelets and platelet activation at the site of injury. Platelet accumulation and activation at the injured site can result in abrupt artery closure. Cathepsin G potently induces platelet aggregation, secretion and calcium mobilization by binding to a specific receptor on platelets (23). Leukocyte elastase, though having no platelet agonist activity itself, increases the apparent affinity of cathepsin G binding to platelets and enhances cathepsin G-induced platelet activation. Thrombospondin 1, which inactivates cathepsin G by binding near the enzyme""s active site, protected fibronectin from cleavage by cathepsin G and blocked cathepsin G-mediated platelet aggregation (24).
Endothelin-1 (ET-1) is a potent vasoconstrictor peptide secreted by endothelial cells. Marked ET-1 degradation is observed in the presence of activated neutrophils. ET-1 inactivation could play a role in acute inflammatory reactions where neutrophils adhere to the vascular endothelial cells. Soybean trypsin inhibitor abolishes ET-1 degradation almost completely, suggesting a role of cathepsin G in ET-1 hydrolysis (25). Among the purified leukocyte enzymes tested, cathepsin G hydrolyzed ET-1 at the highest rate.
Cathepsin G converts angiotensinogen and angiotensin I to angiotensin II (26, 27). The neutrophil-angiotensin system does not require renin or converting enzyme and may function as a mobile effector pathway which modulates tissue blood flow and/or vascular permeability.
Proteinase Inhibitor Structure and Specificity
Cathepsin G, leukocyte elastase and proteinase 3 are neutral serine proteinases that exist primarily in azurophilic granules of neutrophils. Elastase has a preference for hydrophobic (e.g., neutral) residues at the P1 site such as valine, alanine, isoleucine and leucine (28, 29). (The reactive-site sequence of proteinase inhibitors and substrates are written as . . . -P3-P2-P1-Pxe2x80x21-Pxe2x80x22-Pxe2x80x23- . . . , where-P1-Pxe2x80x21-denotes the reactive site). Cathepsin G has a similar preference for large hydrophobic residues (i.e., phenylalanine, leucine) and basic residues (lysine, arginine) and exhibits dual and equal trypsin- and chymotrypsin-like specificities (30). Proteinase 3 prefers small aliphatic amino acids such as alanine, serine and valine at the P1 site (15, 31). The P3-S3 interaction during human leukocyte elastase hydrolysis of peptide substrates has also been determined to be important (32). (S3 refers to the residue on the inhibited proteinase that interacts with the P3 residue on the inhibitor.)
Kunitz Inhibitors
Protein inhibitors of serine proteinases can be grouped into several families, including the Kunitz, serpin, Kazal, and mucous protein inhibitor families, based on conserved structural features. Members of each family exhibit greatly varied binding specificities, and members of different families can have similar inhibitory profiles. The binding specificities of the proteinase inhibitors are determined by the residue at the P1 position as well as other residues that lie at the interface between the inhibitor and the bound target proteinase. The P1 residue in Kunitz domain proteins lies immediately C-terminal to the conserved second cysteine (position 15; aprotinin numbering).
All members of the Kunitz domain protein family have the same number (six) and spacing of cysteine residues. The precise bonding of cysteine residues to form the three intrachain disulfide bonds is known and invariant for all previously known Kunitz members (33).
Members of the Kunitz domain protein family function as inhibitors of serine proteases. Each inhibitor has a unique inhibition specificity profile towards the serine proteases. However, inhibitors with a basic residue (i.e., arginine or lysine) immediately following the second cysteine residue tend to have greater potencies towards proteases that cleave proteins at basic residues. In addition, mutation of the lysine residue at this position in aprotinin to a valine resulted in a dramatic increase in the protein""s potency towards neutrophil elastase, a protease that typically cleaves proteins at residues with small neutral aliphatic side chains (34).
The serine protease inhibitory activities of the Kunitz domain proteins has led to their evaluation as potential therapeutics in a number of disease indications. For example, aprotinin is a potent inhibitor of proteases involved in the blood clotting cascade and is used clinically to reduce bleeding during open heart surgery (35). Human placental bikunin is a potent inhibitor of plasmin, which has been implicated in facilitating metastasis and tumor growth (36). Other disease indications in which serine proteases are believed to play a significant pathological role and in which the Kunitz domain proteins may therefore be effective therapeutics include traumatic brain injury and stroke (37, 38), cystic fibrosis (39, 40), emphysema (41), arthritis and anemia (42) and non-insulin dependent diabetes (43).
Kunitz domains that exist within larger proteins have been shown to retain their functional activities when produced as single domains (44). Kunitz-type inhibitors have been described in the patent literature (85).
Serine proteinase inhibitors of the Kunitz family typically exhibit significantly tighter binding to trypsin and chymotrypsin, two proteases with relatively strict P1 specificities (trypsin=arginine, lysine; chymotrypsin=tyrosine, phenylalanine, tryptophan) but with few restrictions at other P and Pxe2x80x2 positions, than to the three neutral proteinases secreted by neutrophils. For example, aprotinin is a potent inhibitor of trypsin (Ki=0.02 nM) and chymotrypsin (Ki=1.3 nM) but does not inhibit leukocyte elastase (44). Similarly, placental bikunin inhibits trypsin (Ki=0.01 nM) and chymotrypsin (Ki=0.48 nM) but not leukocyte elastase (44). Tissue factor pathway inhibitor (TFPI), another member of the Kunitz family, inhibits trypsin (0.1 nM) and chymotrypsin (Ki=0.75 nM) but is a weak inhibitor of leukocyte elastase (Ki=400 nM) and cathepsin G (Ki=100-200 nM) (45, 46). In addition, these Kunitz family members exhibit potent inhibitory activity towards serine proteinases having trypsin-like substrate specificity involved in both coagulation and fibrinolysis (44-47). Elastase and cathepsin G have been reported to proteolytically cleave and inactivate TFPI (4, 46).
Human inter-alpha-trypsin inhibitor (I alpha I), a plasma Kunitz family proteinase inhibitor, is a potent inhibitor of trypsin (Ki=0.078 nM) and chymotrypsin (1.1 nM) but exhibits somewhat lesser activity against cathepsin G (Ki=18 nM) and leukocyte elastase (Ki=61 nM) (6). Similarly, a Kunitz-type inhibitor purified from Japanese horseshoe crab (Tachypleus tridentatus) hemocytes potently inhibited trypsin (Ki=0.46 nM) and chymotrypsin (Ki=5.5 nM), but was somewhat less active towards leukocyte elastase (Ki=72 nM) (48).
Soybean trypsin inhibitor (STI) is a potent Kunitz family inhibitor of tryspin but a significantly weaker inhibitor of chymotrypsin (Ki(1)=1000 nM; Ki(2)=300 nM) (49). STI has been reported to exhibit similar inhibitory activity towards chymotrypsin and leukocyte elastase (25). On the other hand, a serine protease inhibitor from larvae of parasitic nematode Anisakis simplex that has 96% amino acid identity to soybean trypsin inhibitor was reported to inhibit trypsin and elastase but not chymotrypsin (50). In addition, a Kunitz-type inhibitor purified from potato tubers (Solanum tuberosum L) was reported to be an effective inhibitor of trypsin, leukocyte elastase, and chymotrypsin (51).
Non-Kunitz Proteinase Inhibitors
Numerous serine proteinase inhibitors from families other than that of the Kunitz family have been reported to inhibit neutral serine proteinases, including those secreted by activated neutrophils. Alpha-1-proteinase and alpha-2-macroglobulin, members of the serpin proteinase inhibitor family, inhibit elastase, cathepsin G and proteinase 3 (15, 52-55). Alpha-1-proteinase has been described as the major serum inhibitor of elastase and cathepsin G (54). Alpha-1-antichymotrypsin, another serpin family proteinase inhibitor, inhibits cathepsin G (53, 56, 55) but not proteinase 3 (15), and has been described as another physiological cathepsin G inhibitor (53). Monocyte/neutrophil elastase inhibitor, also a serpin family inhibitor, inhibits elastase and proteinase 3 (57). Antileukoproteinase (SLPI) and elafin, members of the mucous proteinase inhibitor family, inhibit elastase (Ki=0.6 nM) (58, 59) but not proteinase 3 (15) and cathepsin G (58). Eglin C, a member of the potato inhibitor 1 family from leech Hirudo medicinalis, inhibits leukocyte elastase (Ki=0.37 nM) and cathepsin G (Kixcx9c0.1 nM) (60-62) but only weakly inhibits proteinase 3 (15).
The present invention encompasses a newly identified human protein, herein called BTL.010, which has been identified as a member of the Kunitz family of proteinase inhibitors based on the presence of the conserved six cysteines observed in all members of this family. BTL.010 is thought to inhibit preferentially elastase-like proteases over trypsin- and chymotrypsin-like proteases.
The instant invention encompasses the use of BTL.010 for preventing neutrophil and monocyte activation and formation of active oxygen species during the oxidative burst of stimulated granulocytes. The current invention further encompasses the use of BTL.010 for reducing platelet activation and blood coagulation. The invention may also be useful in a method for the prophylactic or therapeutic treatment of patients undergoing angioplasty. The instant invention also encompasses pharmaceutical compositions containing BTL.010 that are useful for the treatment of inflammatory diseases and diseases involving lung and vascular injury.
In accordance with one aspect of the present invention, there are provided novel mature polypeptides comprising the amino acid sequence given in SEQ ID NO:1 as well as biologically active and diagnostically or therapeutically useful fragments, analogues and derivatives thereof.
In accordance with another aspect of the present invention, there are provided isolated nucleic acid molecules encoding the polypeptides of the present invention, including mRNAs, DNAs, cDNAs, genomic DNA, as well as antisense analogs thereof and biologically active and diagnostically or therapeutically useful fragments thereof.
In accordance with still another aspect of the present invention, there are provided processes for producing such polypeptides by recombinant techniques through the use of recombinant vectors. As a further aspect of the present invention, there are provided recombinant prokaryotic and/or eukaryotic host cells comprising a nucleic acid sequence encoding a polypeptide of the present invention.
In accordance with a further aspect of the present invention, there is provided a process for utilizing such polypeptides, or polynucdeotides encoding such polypeptides, for therapeutic purposes, for example, preventing neutrophil and monocyte activation and formation of active oxygen species during the oxidative burst of stimulated granulocytes; or for reducing platelet activation and blood coagulation, or for the treatment of inflammatory diseases and diseases involving lung and vascular injury.
In accordance with another aspect of the present invention, there are provided nucleic acid probes comprising nucleic acid molecules of sufficient length to specifically hybridize to a polynucleotide encoding a polypeptide of the present invention.
In accordance with yet another aspect of the present invention, there are provided diagnostic assays for detecting diseases or susceptibility to diseases related to mutations in a nucleic acid sequence of the present invention and for detecting over-expression of the polypeptides encoded by such sequences.
In accordance with another aspect of the present invention, there is provided a process involving expression of such polypeptides, or polynucleotides encoding such polypeptides, for purposes of gene therapy. As used herein, gene therapy is defined as the process of providing for the expression of nucleic acid sequences of exogenous origin in an individual for the treatment of a disease condition within that individual.
A potential coding sequence containing a putative Kunitz domain sequence was identified by searching a computer database of human genomic sequence information. The protein was found as a result of searching in the NCBI (National Center for Biotechnology Information) Genbank HTG (High-Throughput Genomic) DNA sequence database. This HTG-derived Kunitz domain-containing protein was found by using a Kunitz domain sequence KQDVCEMPKETGPCLAYFLHWWYDKKDNTCSMFVYGG CQGNNNNFQSKANCLNTCKNK (SEQ ID NO:8) as a query and searching in the Genbank High-Throughput Genome Center (HTG) DNA sequence database. These lower-quality HTG sequences are maintained in a separate, small database that is practical to query using a frameshift-tolerant homology search algorithm such as xe2x80x9cFramesearchxe2x80x9d (available from the company Genetics Computer Group, Madison, Wis., a subsidiary of Oxford Molecular, Oxford, UK), or xe2x80x9cTransitionxe2x80x9d (available from Thoughtware LLC, Dallas, Tex.).
The query identified the following significant alignment when searched using the xe2x80x9cTransitionxe2x80x9d TBLASTN algorithm with frameshifting enabled:
(Query is a subsequence of SEQ ID NO:8; Sbjct is SEQ ID NO:9.).
It can be seen that the algorithm corrected a frameshift error at the number xe2x80x9c2xe2x80x9d to generate the alignment. The nucleotide sequence (accession number AC004846) is available from Genbank at NCBI.
When nucleotides 16318-16164 are translated (with frameshift correction) and searched against the Genpept peptide database, this HTG sequence has a highest match of 47% to xe2x80x9ckunitz-type proteinase inhibitorxe2x80x9d (as shown below), confirming its novelty among known protein sequences.
(Query is subsequence of SEQ ID NO:9; Sbjct is SEQ ID NO:10.).
When searched against the assembled Unigene database, this translated HTG sequence has a highest match of 53% to xe2x80x9cAmyloid beta (A4) precursor-like protein 2xe2x80x9d (a kunitz-type proteinase inhibitor, as shown below), confirming its novelty among expressed sequences derived from the EST database.
(Query is SEQ ID NO:9; Sbjct is SEQ ID NO:11.).
Nucleotides 16016 to 16414 from the genomic nucleotide sequence (Genbank accession number AC004846), below, were used to design PCR primers to clone this sequence from human genomic DNA, to clone and resequence the kunitz-homologous region.
The primers that were designed are shown underlined in the above sequence. PCR amplification of human genomic DNA (Clontech, Palo Alto, Calif.) was conducted using these primers. The position of the frameshift error is shown as a xe2x80x9c*xe2x80x9d. The high fidelity enzyme ExpandLong Taq polymerase (Boeringer Manheim, Mannheim, Germany) was used to avoid possible errors. One band around 0.4 kb was amplified, purified with Gel extraction kit (Qiagen, Valencia, Calif.), and cloned into TA cloning vector (Invitrogen, Carlsbad, Calif.). Three independent clones were sequenced, and the correct nucleotide sequence across the region in question was obtained. This analysis resulted in a correction of the genomic sequence
i.e. a xe2x80x9cCxe2x80x9d (cytosine) was inserted at position 11 in this alignment. A genomic clone later submitted to Genbank (accession number AC006342) confirmed this correction.
This correction resulted in the identification of the missing amino acid in the frameshifted protein alignment as a histidine (xe2x80x9cHxe2x80x9d). The full protein sequence of the Kunitz-containing open reading frame (ORF) (between the closest upstream and downstream stop codons in the same frame) is thus:
The Kunitz domain is underlined. For in vivo protein expression, a start codon would have to be supplied to initiate translation in this reading frame, possibly by a splicing event in between the kunitz domain and the stop codon 5xe2x80x2 of this domain in the genomic sequence, or by further corrections to the sequence.
Kunitz domains that exist within larger proteins have been shown to retain their functional activities when produced as single domains (Delaria et.al., 1997, J. Biol. Chem. 272:12209-12). Therefore, the Kunitz-HTG domain by itself, as shown below, is predicted to have activity as a protease inhibitor.
The polypeptides of the present invention comprise polypeptides having the deduced amino acid sequence given by SEQ ID NO:1. The polypeptides of the present invention may include additional amino acid sequences appended to the N- or C-terminal of the peptides having the deduced amino acid sequence given by SEQ ID NO:1. The polypeptides of the present invention may be recombinant polypeptides, natural polypeptides, or synthetic polypeptides, preferably recombinant polypeptides. As used herein, xe2x80x9cproteinxe2x80x9d is synonymous with xe2x80x9cpolypeptide.xe2x80x9d
The present invention further includes a polypeptide which shares at least a 60%, more preferably at least an 80%, still more preferably a 90%, or most preferably at least a 95% sequence identity over at least 20, more preferably at least 30, still more preferably at least 40, or most preferably at least 50 residues with SEQ ID NO:1. (Such polypeptides may be herein referred to as xe2x80x9cpolypeptides of the present inventionxe2x80x9d.)
Such a polypeptide as described above may be (i) one in which one or more of the amino acid residues are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and such substituted amino acid residue may or may not be one encoded by the genetic code, or (ii) one in which one or more of the amino acid residues includes a substituent group, or (iii) one in which the mature polypeptide is fused with another compound, such as a compound to increase the half-life of the polypeptide (for example, polyethyleneglycol), or (iv) one in which additional amino acids are fused to the mature polypeptide, such as a leader or secretory sequence or a sequence which is employed for purification of the mature polypeptide or a proprotein sequence or mature protein sequence beyond the Kunitz domain, or (v) one in which one or more amino acids are deleted from or inserted into the sequence of the polypeptide. Combinations of the above-described types of variations in the peptide sequence are within the scope of the invention. Such polypeptides are deemed to be within the scope of those skilled in the art from the teachings herein.
The polypeptide of the present invention may contain amino acids other than the 20 gene-encoded amino acids. The polypeptides may be modified by either natural processes, such as posttranslational processing, or by chemical modification techniques which are well known in the art. Such modifications are well described in basic texts and in more detailed monographs, as well as in a voluminous research literature. Modifications can occur anywhere in a polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini. It will be appreciated that the same type of modification may be present in the same or varying degrees at several sites in a given polypeptide. Also, a given polypeptide may contain many types of modifications. Polypeptides may be branched, for example, as a result of ubiquitination, and they may be cyclic, with or without branching. Cyclic, branched, and branched cyclic polypeptides may result from posttranslation natural processes or may be made by synthetic methods. Modifications include acetylation, acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of phosphatidylinositol, cross-linking, cyclization, disulfide bond formation, demethylation, formation of covalent cross-links, formation of cysteine, formation of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodination, methylation, myristoylation, oxidation, pegylation, proteolytic processing, phosphorylation, prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of amino acids to proteins such as arginylation, and ubiquitination. (63-66)
The polypeptides and polynucleotides of the present invention are preferably provided in an isolated form, and preferably are purified to homogeneity. The term xe2x80x9cisolatedxe2x80x9d means that the material is removed from its original environment (e.g., the natural environment if it is naturally occurring). For example, a naturally occurring polynucleotide or polypeptide present in a living animal is not isolated, but the same polynucleotide or DNA or polypeptide, separated from some or all of the coexisting materials in the natural system, is isolated. Such polynucleotide could be part of a vector and/or such polynucleotide or polypeptide could be part of a composition, and still be isolated in that such vector or composition is not part of its natural environment.
As known in the art xe2x80x9csimilarityxe2x80x9d between two polypeptides is determined by comparing the amino acid sequence and its conserved amino acid substitutes of one polypeptide to the sequence of a second polypeptide. Such conservative substitutions include those described by Dayhoff (67) and by Argos (68). For example, amino acids belonging to one of the following groups represent conservative changes:
ala, pro, gly, gln, asn, ser, thr;
cys, ser, tyr, thr;
val, ile, leu, met, ala, phe;
lys, arg, his;
phe, tyr, trp, his; and
asp, glu.
(Note that these grouping are examples; other groupings may represent more relevant choices.)
xe2x80x9cSimilarityxe2x80x9d or xe2x80x9cidentityxe2x80x9d refers to sequence conservation, or xe2x80x9chomologyxe2x80x9d, between two or more peptides or two or more nucleic acid molecules, normally expressed in terms of percentages. When a position in the compared sequences is occupied by the same base or amino acid (xe2x80x9cresiduexe2x80x9d), then the molecules are identical at that position. When a position in two compared peptide sequences is occupied by an amino acid with similar physical properties (a conservative substitution as determined by a given scoring matrix; similarity is thus dependent on the scoring matrix chosen), then the molecules are similar at that position. The percent identity or similarity can be maximized by aligning the compared sequences alongside each other, sliding them back and forth, and conservatively introducing gaps in the sequences where necessary. The percent identity is calculated by counting the number of identical aligning residues, dividing by the total length of the aligned region, including gaps in both sequences, and multiplying by 100. Identity would thus be expressed as, e.g., xe2x80x9c60% identity over 200 amino acids,xe2x80x9d or xe2x80x9c57% identity over 250 amino acids.xe2x80x9d Similarity is calculated by counting both identities and similarities in the above calculation. For example, the alignment below has 37.5% sequence identity over 56 amino acids ( (21 identities/56 residues)xc3x97100%), where 56 is the total length of the aligned region.
As a further example, the same alignment below has 55.4% sequence similarity over 56 amino acids ( (31 similarities/56 residues)xc3x97100%), where 56 is the total length of the aligned region. In this example, conservative substitutions are indicated by a plus sign and the total similarities is given by the sum of the identities and the conservative substitutions. (As noted above, determination of conservative substitutions is dependent on the scoring matrix chosen. The same alignment below may yield a different value for percent similarity using a different scoring matrix.)
Both of the sequences in the aligned region may be contained within longer, less homologous sequences. xe2x80x9cUnrelatedxe2x80x9d or xe2x80x9cnon-homologousxe2x80x9d sequences typically share less than 40% identity at the peptide level, preferably less than 25% identity.
The invention further encompasses polynucleotides which code for the above-described polypeptides of the present invention. These polynucleouides may be in the form of RNA or in the form of DNA, which DNA includes cDNA, genomic DNA, and synthetic DNA. The DNA may be double-stranded or single-stranded. The polynucleotides may include: only the coding sequence for the mature polypeptide; the coding sequence for the mature polypeptide and additional coding sequence such as a leader or secretory sequence or a proprotein sequence; the coding sequence for the mature polypeptide (and, optionally, additional coding sequence) and non-coding sequence, such as introns or non-coding sequence 5xe2x80x2 and/or 3xe2x80x2 of the coding sequence for the mature polypeptide. Thus, the term xe2x80x9cpolynucleotide encoding a polypeptidexe2x80x9d encompasses a polynucleotide which includes only coding sequence for the polypeptide as well as a polynucleotide which includes additional coding and/or non-coding sequence.
The present invention further relates to variants of the herein above-described polynucleotides. The variants of the polynucleotides may be naturally occurring allelic variants of the polynucleotides or non-naturally occurring variants of the polynucleotides. As known in the art, an allelic variant is an alternate form of a polynucleotide sequence which may have a substitution, deletion, or addition of one or more nucleotides which does not substantially alter the function of the encoded polypeptides. Thus, the present invention includes polynucleotides encoding the same mature polypeptide as described in Example 1, below, as well as variants of such polynucleotides which variants include deletion variants, substitution variants and addition or insertion-variants.
The present invention also includes polynucleotides wherein the coding sequence for the mature polypeptides may be fused in the same reading frame to a polynucleotide sequence which aids in expression and secretion of a polypeptide from a host cell, for example, a leader sequence which functions as a secretory sequence for controlling transport of a polypeptide from the cell. The polypeptide having a leader sequence is a preprotein and may have the leader sequence cleaved by the host cell to form the mature form of the polypeptide. The polynucleotides may also encode for a proprotein which is the mature protein plus additional amino acid residues. A mature protein having a prosequence is a proprotein and is an inactive form of the protein. Once the prosequence is cleaved an active mature protein remains. For example, the polynucleotides of the present invention may code for a mature protein or for a protein having a prosequence or for a protein having both a prosequence and a presequence (leader sequence).
The polynucleotides of the present invention may also have the coding sequence fused in frame to a marker sequence which allows for purification of the polypeptide of the present invention. The marker sequence may be a hexa-histidine tag supplied by a pQE-9 vector to provide for purification of the mature polypeptide fused to the marker in the case of a bacterial host, or, for example, the marker sequence may be a hemagglutinin (HA) tag when a mammalian host, e.g. COS-7 cells, is used. The HA tag corresponds to an epitope derived from the influenza hemagglutinin protein (69).
The term xe2x80x9cgenexe2x80x9d means the segment of DNA involved in producing a polypeptide chain; it includes regions preceding and following the coding region (leader and trailer) as well as intervening sequences (introns) between individual coding segments (exons). Fragments of the full length BTL.010 gene may be used as a hybridization probe for a cDNA library to isolate the full length gene and to isolate other genes which have a high sequence similarity to the gene or similar biological activity. Probes of this type preferably have at least 30 bases and may contain, for example, 50 or more bases. The probe may also be used to identify a cDNA clone corresponding to a full length transcript and a genomic clone or clones that contain the complete BTL.010 gene including regulatory and promotor regions, exons, and introns. An example of a screen comprises isolating the coding region of the BTL.010 gene by using the known DNA sequence to synthesize an oligonucleotide probe. Labeled oligonucleotides having a sequence complementary to that of the gene of the present invention are used to screen a library of human cDNA, genomic DNA or mRNA to determine which members of the library the probe hybridizes to.
The present invention is directed to polynucleotides having at least a 70% identity, preferably at least 90% identity, and more preferably at least a 95% identity over at least 90 bases, preferably over at least 120 bases, more preferably over at least 160 bases to a polynucleotide which encodes a polypeptide of the present invention, as well as fragments thereof, which fragments have at least 30 bases and preferably at least 50 bases and to polypeptides encoded by such polynucleotides.
The present invention also relates to vectors that include polynucleotides of the present invention as above described, host cells that are genetically engineered with vectors of the invention, and the production of polypeptides of the invention by recombinant techniques. Host cells may be genetically engineered (transduced or transformed or transfected) with the vectors of this invention which may be, for example, a cloning vector or an expression vector. The vector may be, for example, in the form of a plasmid, a viral particle, a phage, etc. The engineered host cells can be cultured in conventional nutrient media modified as appropriate for activating promoters, selecting transformants or amplifying the BTL.010 genes. The culture conditions, such as temperature, pH and the like, are those previously used with the host cell selected for expression, and will be apparent to the ordinarily skilled artisan. The polynucleotide of the present invention may be employed for producing a polypeptide by recombinant techniques.
Thus, for example, the polynucleotide sequence may be included in any one of a variety of expression vehicles, in particular vectors or plasmids for expressing a polypeptide. Such vectors include chromosomal, non-chromosomal and synthetic DNA sequences, e.g., derivatives of SV40; bacterial plasmids; phage DNA; yeast plasmids; vectors derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, adenovirus, fowl pox virus, and pseudorabies. However, any other vector or plasmid may be used as long as they are replicable and viable in the host.
The appropriate DNA sequence may be inserted into the vector by a variety of procedures. Such procedures and others are deemed to be within the scope of those skilled in the art. The DNA sequence in the expression vector is operatively linked to an appropriate expression control sequence(s) (promoter) to direct mRNA synthesis. As representative examples of such promoters, there may be mentioned: LTR or SV40 promoter, the E. coli lac or trp, the phage lambda PL promoter and other promoters known to control expression of genes in prokaryotic or eukaryotic cells or their viruses. The expression vector may also contain a ribosome binding site for translation initiation and a transcription terminator. The vector may also include appropriate sequences for amplifying expression. In addition, the expression vectors preferably contain a gene to provide a phenotypic trait for selection of transformed host cells such as dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as tetracycline or ampicillin resistance in E. coli. The vector containing the appropriate DNA sequence as herein above described, as well as an appropriate promoter or control sequence, may be employed to transform an appropriate host to permit the host to express the protein. As representative examples of appropriate hosts, there may be mentioned: bacterial cells, such as E. coli, Salmonella typhimurium, Streptomyces; fungal cells, such as yeast; insect cells, such as Drosophila S2 and Spodoptera Sf9; animal cells such as CHO, COS or Bowes melanoma; adenoviruses; plant cells, etc. The selection of an appropriate host is deemed to be within the scope of those skilled in the art from the teachings herein.
The present invention also includes recombinant constructs comprising one or more of the sequences as broadly described above. The constructs comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention has been inserted, in a forward or reverse orientation. In a preferred aspect of this embodiment, the construct further comprises regulatory sequences, including, for example, a promoter, operably linked to the sequence. Large numbers of suitable vectors and promoters are known to those of skill in the art and are commercially available. The following vectors are provided by way of example. Bacterial: pQE70, pQE60, pQE-9 (Qiagen), pBS, phagescript, psiX174, pBluescript SK, pBsKS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene), pTRC99A, pKK223-3, pKK233-3, pDR540, PRIT5 (Pharmacia). Eukaryotic: pWLneo, pSV2cat, pOG44, pXT1, pSG (Stratagene) pSVK3, pBPV, pMSG, PSVL (Pharmacia). However, any other plasmid or vector may be used as long as they are viable in the host. Promoter regions can be selected from any desired gene using CAT (chloramphenicol acetyl transferase) vectors or other vectors with selectable markers. Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters include laci, lacZ, T3, T7, gpt, lambda PR, PL and trp. Eukaryotic promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and mouse metallothionein-I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the art.
The present invention also relates to host cells containing the above-described construct. The host cell can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such as a yeast cell, or the host cell can be a prokaryotic cell, such as a bacterial cell. Introduction of the construct into the host cell can be effected by calcium phosphate transfection, DEAE-Dextran mediated transfection, or electroporation (70). The constructs in host cells can be used in a conventional manner to produce the gene product encoded by the recombinant sequence. Alternatively, the polypeptides of the invention can be synthetically produced by conventional peptide synthesizers.
Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described by Sambrook (71), the disclosure of which is hereby incorporated by reference.
Transcription of a DNA encoding the polypeptides of the present invention by higher eukaryotes is increased by inserting an enhancer sequence into the vector. Enhancers are cis-acting elements of DNA, usually from about 10 to 300 bp, that act on a promoter to increase its transcription. Examples include the SV40 enhancer on the late side of the replication origin (bp 100 to 270), a cytomegalovirus early promoter enhancer, a polyoma enhancer on the late side of the replication origin, and adenovirus enhancers. Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S. cerevisiae TRP1 gene, and a promoter derived from a highly-expressed gene to direct transcription of a downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), alpha factor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation, initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium. Optionally, the heterologous sequence can encode a fusion protein including an N-terminal identification peptide imparting desired characteristics, e.g., stabilization or simplified purification of expressed recombinant product.
Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a desired protein together with suitable translation, initiation, and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and, if desirable, to provide amplification within the host. Suitable prokaryotic hosts for transformation include E.coli, Bacillus subtilis, Salmonella typhimurium and various species within the genera Pseudomonas, Streptomyces, and Staphylococcus, although others may also be employed as a matter of choice. Useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication derived from commercially available plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 37017). Such commercial vectors include, for example, pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM1 (Promega Biotec, Madison, Wis.) These pBR322 xe2x80x9cbackbonexe2x80x9d sections are combined with an appropriate promoter and the structural sequence to be expressed.
After transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter may be de-repressed, if necessary, by appropriate means (e.g., temperature shift or chemical induction) and the cells may be cultured for an additional period. Cells are typically harvested by centrifugation, disrupted by physical or chemical means, and the resulting crude extract retained for further purification. Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents.
Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts (82) and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines. Mammalian expression vectors will generally comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites, polyadenylation site, splice donor and acceptor sites, transcription termination sequences, and 5xe2x80x2 flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required non-transcribed genetic elements.
The polypeptide of the present invention may be recovered and purified from recombinant cell cultures by methods used heretofore, including ammonium sulfate or ethanol precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, affinity chromatography, hydroxyapatite chromatography, and lectin chromatography. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps.
The polypeptide of the present invention may be a naturally purified product, or a product of chemical synthetic-procedures, or produced by recombinant techniques from a prokaryotic or eukaryotic host (for example, by bacterial, yeast, higher plant, insect and mammalian cells in culture). Depending upon the host employed in a recombinant production procedure, the polypeptides of the present invention may be glycosylated with mammalian or other eukaryotic carbohydrates or may be non-glycosylated. Polypeptides of the invention may also include an initial methionine amino acid residue.