Protein transport is a quintessential process for both prokaryotic and eukaryotic cells. Transport of an individual protein usually occurs via an amino-terminal signal sequence which directs, or targets, the protein from its ribosomal assembly site to a particular cellular or extracellular location. Transport may involve any combination of several of the following steps: contact with a chaperone, unfolding, interaction with a receptor and/or a pore complex, addition of energy, and refolding. Moreover, an extracellular protein may be produced as an inactive precursor. Once the precursor has been exported, removal of the signal sequence by a signal peptidase activates the protein.
Although amino-terminal signal sequences vary substantially, many patterns and overall properties are shared. Recently, hidden Markov models (HMMs), statistical alternatives to FASTA and Smith Waterman algorithms, have been used to find shared patterns, specifically consensus sequences (Pearson, W. R. and D. J. Lipman (1988) Proc. Natl. Acad. Sci. 85:2444-2448; Smith, T. F. and M. S. Waterman (1981) J. Mol. Biol. 147:195-197). Although they were initially developed to examine speech recognition patterns, HMMs have been used in biology to analyze protein and DNA sequences and to model protein structure (Krogh, A. et al. (1994) J. Mol. Biol. 235:1501-1531; Collin, M. et al. (1993) Protein Sci. 2:305-314). HMMs have a formal probabilistic basis and use position-specific scores for amino acids or nucleotides and for opening and extending an insertion or deletion. The algorithms are quite flexible in that they incorporate information from newly identified sequences to build even more successful patterns. To find signal sequences, multiple unaligned sequences are compared to identify those which encode a peptide of 20 to 50 amino acids with an N-terminal methionine.
Some examples of the protein families which are known to have signal sequences are receptors (nuclear, 4 transmembrane, G protein coupled, and tyrosine kinase), cytokines (chemokines), hormones (growth and differentiation factors), neuropeptides and vasomediators, protein kinases, phosphatases, phospholipases, phosphodiesterases, nucleotide cyclases, matrix molecules (adhesion, cadherin, extracellular matrix molecules, integrin, and selectin), G proteins, ion channels (calcium, chloride, potassium, and sodium), proteases, transporter/pumps (amino acid, protein, sugar, metal and vitamin; calcium, phosphate, potassium, and sodium) and regulatory proteins. Descriptions of some of these proteins (receptors, kinases, and matrix proteins) and diseases associated with their dysfunction follow.
G-protein coupled receptors (GPCR) are a large group of receptors which transduce extracellular signals. GPCRs include receptors for biogenic amines such as dopamine, epinephrine, histamine, glutamate (metabotropic effect), acetylcholine (muscarinic effect), and serotonin; for lipid mediators of inflammation such as prostaglandins, platelet activating factor, and leukotrienes; for peptide hotmones such as calcitonin, C5a anaphylatoxin, follicle stimulating hormone, gonadotropin releasing hormone, neurokinin, oxytocin, and thrombin; and for sensory signal mediators such as retinal photopigments and olfactory stimulatory molecules. The structure of these highly-conserved receptors consists of seven hydrophobic transmembrane regions, an extracellular N-terminus, and a cytoplasmic C-terminus. The N-terminus interacts with ligands, and the C-terminus interacts with intracellular G proteins to activate second messengers such as cyclic AMP (cAMP), phospholipase C, inositol triphosphate, or ion channel proteins. Three extracellular loops alternate with three intracellular loops to link the seven transmembrane regions. The most conserved parts of these proteins are the transmembrane regions and the first two cytoplasmic loops. A conserved, acidic-Arg-aromatic triplet present in the second cytoplasmic loop may interact with the G proteins. The consensus pattern, Gly Ser Thr Ala Leu Ile Val Met Tyr Trp Cys!-Gly Ser Thr Ala Asn Cys Pro Asp Glu!-{Glu Asp Pro Lys Arg His}-Xaa(2)-Leu Ile Val Met Asn Gln Gly Ala!-Xaa(2)-Leu Ile Val Met Phe Thr!-Gly Ser Thr Ala Asn Cys!-Leu Ile Val Met Phe Tyr Trp Ser Thr Ala Cys!-Asp Glu Asn His!-Arg-Phe Tyr Trp Cys Ser His!-Xaa(2)-Leu Ile Val Met! is characteristic of most proteins belonging to this group (Bolander, F. F. (1994) Molecular Endocrinology, Academic Press, San Diego, Calif.; Strosberg, A. D. (1991) Eur. J. Biochem. 196:1-10).
The kinases comprise the largest known group of proteins, a superfamily of enzymes with widely varied functions and specificities. Kinases regulate many different cell proliferation, differentiation, and signaling processes by adding phosphate groups to proteins. Receptor mediated extracellular events trigger the transfer of these high energy phosphate groups and activate intracellular signaling cascades. Activation is roughly analogous to the turning on a molecular switch, and in cases where signalling is uncontrolled, may be associated with or produce inflammation and cancer.
Kinases are usually named after their substrate, their regulatory molecule, or after some aspect of a mutant phenotype. Almost all kinases contain a similar 250-300 amino acid catalytic domain. The N-terminal domain, which contains subdomains I-IV, generally folds into a two-lobed structure which binds and orients the ATP (or GTP) donor molecule. The larger C terminal lobe, which contains subdomains VIA-XI, binds the protein substrate and carries out the transfer of the gamma phosphate from ATP to the hydroxyl group of a serine, threonine, or tyrosine residue. Subdomain V spans the two lobes.
The kinases may be categorized into families by the different amino acid sequences (between 5 and 100 residues) located on either side of, or inserted into loops of, the kinase domain. These amino acid sequences allow the regulation of each kinase as it recognizes and interacts with its target protein. The primary structure of the kinase domain is conserved and contains specific residues and identifiable motifs or patterns of amino acids. The serine threonine kinases represent one family which preferentially phosphorylates serine or threonine residues. Many serine threonine kinases, including those from human, rabbit, rat, mouse, and chicken cells and tissues, have been described (Hardie, G. and Hanks, S. (1995) The Protein Kinase Facts Books, Vol I:7-20 Academic Press, San Diego, Calif.).
The matrix proteins (MPs) provide structural support, cell and tissue identity, and autocrine, paracrine and juxtacrine properties for most eukaryotic cells (McGowan, S. E. (1992) FASEB J. 6:2895-2904). MPs include adhesion molecules, integrins and selectins, cadherins, lectins, lipocalins, and extracellular matrix proteins (ECMs). MPs possess many different domains which interact with soluble, extracellular molecules. These domains include collagen-like domains, EGF-like domains, immunoglobulin-like domains, fibronectin-like domains, type A domain of von Willebrand factor (vWFA)-like modules, ankyrin repeat modules, RDG or RDG-like sequences, carbohydrate-binding domains, and calcium ion-binding domains.
For example, multidomain or mosaic proteins play an important role in the diverse functions of the ECMs (Engel, J. et al. (1994) Development S35-42). ECM proteins (ECMPs) are frequently characterized by the presence of one or more domains which may contain a number of potential intracellular disulphide bridge motifs. For example, domains which match the epidermal growth factor tandem repeat consensus are present within several known extracellular proteins that promote cell growth, development, and cell signaling. Other domains share internal homology and a regular distribution of single cysteines and cysteine doublets. In the serum albumin family, cysteine arrangement generates the characteristic `double-loop` structure (Soltysik-Espanola, M. et al. (1994) Dev. Biol. 165:73-85) important for ligand-binding (Kragh-Hansen, U. (1990) Danish Med. Bull. 37:57-84). Other ECMPs are members of the vWFA-like module superfamily, a diverse group of proteins with a module sharing high sequence similarity. The vWFA-like module is found not only in plasma proteins but also in plasma membrane and ECMPs (Colombatti, A. and Bonaldo, P. (1991) Blood 77:2305-2315). Crystal structure analysis of an integrin vWFA-like module shows a classic "Rossmann" fold and suggests a metal ion-dependent adhesion site for binding protein ligands (Lee, J.-O. et al. (1995) Cell 80:631-638).
The diversity, distribution and biochemistry of MPs is indicative of their many, overlapping roles in cell proliferation and cell signaling. MPs function in the formation, growth, remodeling, and maintenance of bone, and in the mediation and regulation of inflammation. Biochemical changes that result from congenital, epigenetic, or infectious diseases affect the expression and balance of MPs. This balance, in turn, affects the activation, proliferation, differentiation, and migration of leukocytes and determines whether the immune response is appropriate or self-destructive (Roman, J. (1996) Immunol. Res. 15:163-178).
Adenylyl cyclases (AC) are a group of second messenger molecules which actively participate in cell signaling processes. There are at least eight types of mammalian ACs which show regions of conserved sequence and are responsive to different stimuli. For example, the neural-specific type I AC is a Ca.sup.++ -stimulated enzyme whereas the human type VII is unresponsive to CA.sup.++ and responds to prostaglandin E1 and isoproterenol. Characterization of these ACs, their tissue distribution, and the activators and inhibitors of the different types of ACs is the subject of various investigations (Nielsen, M. D. et al. (1996) J. Biol. Chem. 271:33308-16; Hellevuo, K. et al. (1995) J. Biol. Chem. 270:11581-9). AC interactions with kinases and G proteins in the intracellular signaling pathways of all tissues make them interesting candidate molecules for pharmaceutical research.
ATP diphosphohydrolase (ATPDase) is an enzyme expressed and secreted by quiescent endothelial cells and involved in vasomediation. The physiological role of ATPDase is to convert ATP and ADP to AMP. When this conversion occurs in the blood vessels during inflammatory response, it prevents extracellular ATP from causing vascular injury by inhibiting platelet activation and modulating vascular thrombosis (Robson, S. C. et al. (1997) J. Exp. Med.185:153-63).
The discovery of new signal peptide-containing proteins and the polynucleotides encoding these molecules satisfies a need in the art by providing new compositions useful in the diagnosis, treatment, and prevention of diseases associated with cell proliferation and cell signaling, particularly cancer, immune response and neuronal disorders.