The present invention relates to syndecan interacting protein(s) and the use thereof.
The invention relates to the identification of a protein that binds to the cytoplasmic domain of the syndecan. This protein, now called syntenin, contains a tandem repeat of PDZ-domains that reacts with the FYA C-terminal amino acid sequence of the syndecans (Trends Biochem.Sci., 20, p. 350, 1995: Origin of PDZ {DHR, GLGF}, by M. B. Kennedy). Endogenous syntenin appears to be localized to the cytoskeleton.
GFP-syntenin fusion-proteins decorate the plasma membrane and intracellular vesicles, where they co-localize and -segregate with syndecan cytoplasmic domains. Syntenin, therefore, is an unexpected candidate for connecting the cytoskeleton to heparan sulfate-assisted signal transduction pathways.
The syndecan cytoplasmatic link (sycl) protein or syndecan interacting protein is referred to by the name xe2x80x9csynteninxe2x80x9d, meaning xe2x80x9cputting tension on the syndecans.xe2x80x9d
Heparan sulfate proteoglycans (HSPG) are proteins which are mostly associated with the cell membrane. The most characterizing feature of these proteins is that they are substituted with different heparan sulfate sugar derivatives which strongly determine the function of said protein. Basically there are two classes of membrane heparan sulfate proteoglycans. First the best characterized membrane proteoglycans are the syndecans, which have a membrane spanning core protein. The second class is the glypican family which are located at the cell surface and anchored in the cell membrane through GPI (glycosyl phosphatidyl inositol).
The existence of two distinct highly conserved multigene families of cell surface proteoglycans, the syndecan and glypican families respectively, suggests two distinctive cellular and/or subcellular pathways wherein these proteins will function. Ramifications within these pathways that are specified by the variations on a basic structural theme are realized by the various members of these families. The aforementioned HSPGs are involved in several physiological processes and play a definite a role in the transmission of signals from outside a cell into the cell itself. Their their activities are characterized by the specific binding of heparan sulfate molecules to proteinase inhibitors, cell adhesion molecules or growth factors.
If the structure of HSPG is disturbed, abnormal cell growth and abnormal morphogenesis occur. Therefore it is of utmost importance to know and understand the structure-function relation of the proteoglycans and the way they transmit outside signals through the cell membrane into the cell, the so-called signal transduction cascade.
As mentioned above syndecans are transmembrane proteoglycans that place structurally heterogeneous heparan sulfate chains at the cell surface and a highly conserved polypeptide in the cytoplasm. Their versatile heparan sulfate moieties support various processes of molecular recognition, signaling and trafficking.
The cell surface heparan sulfate proteoglycans are at the cross-section of several different pathways. Their heparan sulfate moieties bind various differentiation-, growth-, and scatter factors, facilitate the occupation and activation of the corresponding signal-transducing receptors, and are involved in the internalization and clearance of the signaling complexes from the cell surface. They also assist receptors that are involved in cellxe2x80x94cell and cell-matrix adhesion, and assist scavenging receptors that are involved in the endocytosis and transcytosis of lipoproteins and lipases. They also bind and activate serine proteinase inhibitors and accelerate the reactions of these inhibitors with their targets. Proteolysis, lipolysis, mesoderm-induction, gastrulation, angiogenesis, neuritogenesis all appear to be regulated by or to depend on heparan sulfate, because this glycosaminoglycan is needed for the allosteric activation, approximation and compartmentalization of the reactants that are engaged in these processes.
In most cells syndecans represent the major source of cell surface heparan sulfate. The four known syndecans are small type I membrane proteins, with similar and simple domain organizations: a single ectodomain, membrane-span, and cytoplasmic domain. Except for the presence of three or four consensus sites for heparan sulfate attachment, near the amino-termini of the proteins, and a dibasic, presumably protease-sensitive site at the junctions with the membrane spanning segments, the ectodomains of the different syndecans have little in common. The structures of these ectodomains have also not been evolutionary conserved, except for these shared structural elements. The membrane-spanning and the small cytoplasmic domains of the syndecans, in contrast, show extensive structural similarity (60% sequence identity) and have been highly conserved during evolution. All four vertebrate syndecans and the single Drosophila syndecan share the amino acid sequence RM(K/R)KKDEGSYxe2x80x94depicted in one-letter codexe2x80x94 in the membrane-proximal segments of their cytoplasmic domains, and the amino acid sequence EFYA xe2x80x94depicted in one letter codexe2x80x94 at their C-termini. This suggests that the extracellular heparan sulfate moieties, the cytoplasmic protein moieties, and the contiguity of these moieties are essential for syndecan function. The syndecans may provide for a transmembrane link between extracellular heparin-steered processes and intracellular structural or regulatory proteins, and mediate outside-in or inside-out effects on signaling or effector systems.
In order to understand above mechanism/cascade a search for Syndecan cytoplasmic links (sycls) was initiated using the cytoplasmic domains of four different syndecans as baits in a yeast two-hybrid screening assay. The insert of one apparent truly positive clone (yielding HIS+ and LacZ+ phenotypes in combination with all four syndecan/Gal4 DNA-binding domain fusion constructs, but not with Gal4 DNA-binding domain alone or with p53 fused to the Gal4 DNA-binding domain) was sequenced, and used as a starting point to obtain a cDNA coding for the corresponding full length sycl protein.
The present invention concerns syndecan interacting protein(s) obtainable by a two-hybrid screening assay whereby as bait a cytoplasmic domain comprising the amino acid sequence FYA as C-terminal sequence as occurring in syndecan and as prey a cDNA library is used. For the purpose of the invention the cDNA library can be any suitable cDNA library, but preferably be a mammal, more preferably a human and most preferably a human liver cDNA library. The fall length sycl thus obtained consists of 298 aminoacids, and can be divided in three or four parts. The first amino-terminal region (aa 1-109) shows no striking homology to any known structural motif. It is relatively rich in proline and contains five tyrosines, while the remainder of the protein is free of tyrosine. Based on sequence alignments, the second (aa 101-193) and third (aa 194-274) regions of sycl appear to correspond to a tandem repeat of two PDZ domains. The sequence coding for the putative second PDZ domain is extended by 24 amino acids (aa 275-298), which may still be part of the second PDZ domain or compose a fourth separate C-terminal domain. PDZ domains have recently been recognized as one of the conserved modular structures that support proteinxe2x80x94protein interactions and networking. PDZ domains mediate proteinxe2x80x94protein interactions by binding to the carboxy-terminal ends of target proteins, and often occur in association with other functional modules, such as SH3 domains, protein tyrosine phosphatase domains, domains related to guanylate kinase (GUK), to band 4.1 protein, leucine zipper motifs, and additional PDZ domains. PDZ domains have now been discovered in a variety of proteins, and shown to bind to membrane channels, receptors (e.g. wingless and Notch), tumor supressor proteins (APC), GAPs and GEFs. These interactions appear to be involved in the formation of multimeric protein complexes that influence receptor positioning and clustering, and the connections of receptors and receptor-associated molecules to cytoskeletal proteins and downstream signal effectors. Further tests were aimed at investigating the involvement of the sycl PDZ domains in the syndecan-sycl interaction, and potential further links of sycl to the intracellular cytoskeleton.
The Genbank accession number for this sycl/syntenin cDNA and protein sequence according to the invention is AF000652.
The yeast two-hybrid system (Chien et al, 1991; P.N.A.S., 88, 9578-9582) was used to identify the domains of sycl and the residues in the syndecan cytoplasmic domains that are responsible for the syndecan-sycl interaction. Different parts of the original clone (3p11) and of the sycl cDNA were subcloned in the pGAD10 vector (Clontech Laboratories), to code for fusions between the activating domain of Gal4 (Gal4 AD) and various full length, truncated, deleted, point-mutated, or epitope-tagged versions of sycl. Full length sycl (tagged in the C-terminus with myc-epitope or not) and the original isolate (missing the first 91 aa of sycl) were able to interact with the full length syndecan (2 or 3) cytoplasmic domains, as revealed by growth on His- plates and b-gal activity. Even the fragment missing the first 112 amino acids of sycl is able to interact with the full length of syndecan. The N-terminal domain alone and either of the two PDZ domains alone were all inactive, suggesting that the paired PDZ domains are required for the binding interaction. Deletion of the complete N-terminal region (aa position 1-113), also abolished the interaction, indicating that the two PDZ domains and part of the N-terminal sequence are required (to allow the correct folding of the PDZ domains or for the interaction itself). It is not clear why the PDZ domains of sycl are not or less active separately. But a requirement for paired PDZ domains (for instance two PDZ1 or two PDZ2 domains) has also been observed for the interactions of the GRIP adaptor protein with AMPA receptors 12 and the interaction of hDlg with with the cytoplasmic domain of the Shaker-type K+channel.
The sycl binding sites in the syndecan cytoplasmic domain were deduced from the testing of a series of deletion- and alanine-substitution mutants of the syndecan-2 cytoplasmic domain. These mutants were cloned in pAS2, as Gal4 DNA-binding domain fusion proteins, and expressed as partners for the Gal4 activation domain-full length sycl fusion protein. All the C-terminal deletions and the substitutions of alanine for the F-residue at xe2x88x922 and the Y-residue at xe2x88x921, but not the alanine substitution for the E-residue at xe2x88x923,abolished the binding interaction. Deletion of all but the last four residues from the cytoplasmic domain, only partially abolished the interaction (His-,but gal+). This localizes the binding site to the C-terminal FYA sequence of the syndecans, and appears consistent with the concept that PDZ proteins mediate proteinxe2x80x94protein interactions by binding to the C-termini of proteins and discriminate among these sequences.
To examine the interaction between the syndecan cytoplasmic domain and sycl in systems other than yeast, the various sycl and syndecan constructs were also expressed as GST-fusion proteins (Glutathione S-Transferase), and used in ligand blot-assays and surface plasmon resonance experiments. In ligand blots, GST-sycl full length (FL) and GST-sycl M92 (sycl sequence starting at M92) fusion proteins failed to bind to GST itself, to itself or to fusion proteins composed of GST and to the C-terminal deletion, the F(C30)A or Y(C31)A mutants (at positions 30 and 31 respectively), of the syndecan-2 cytoplasmic domain. The same GST-sycl constructs, in contrast, bound to fusion proteins composed of GST and any of the other syndecan-2 mutants or four different wild type syndecan cytoplasmic domains. In biosensor experiments, a streptavidin-immobilized biotinylated synthetic peptide, representing the 32 cytoplasmic amino acids of syndecan-2, bound the GST-sycl FL and GST- sycl M92 proteins, but not fusion proteins composed of GST and parts of sycl containing only one of the two PDZ domains.
Unlike the yeast two-hybrid assay, the biosensor experiment showed some binding of sycl that missed all of the N-terminal domain but retained both PDZ domains. A calculation of the kon and koff values for the binding of sycl, from curves obtained at different concentrations of the GST-sycl FL or M92 fusion proteins and using the BIAcore2000 software, proved less relevant since the data were not consistent with a simple binding interaction, but the data suggest rapid dissociation (koff=10xe2x88x923, if one would accept a A+B less than = greater than AB interaction model). The sycl moiety itself, isolated from the fusion protein with the help of factor Xa, also bound to the synthetic peptide, indicates that the GST-moiety itself was not needed. No sycl-peptide binding interaction could be observed, however, when the GST-sycl proteins were immobilized on a biosensor chip coated with an anti-GST antibody and the peptide was perfused over the immobilized sycl. These data suggest that the sycl-syndecan interaction may be very transient and depend on cooperative syndecan interactions. This implies that sycl-syndecan interactions may require the clustering of the syndecans.
The subcellular expression of sycl and its potential colocalization with syndecan was investigated with the help of monoclonal antibodies raised against sycl fusion proteins. Association was investigated by immunohistochemistry, and through the overexpression of syndecans and Green Fluorescent Protein-fusion (GFP) proteins in CHO-cells. In non-transfectant CHO cells most of the sycl-immunoreactivity was associated with cytoplasmic fibrils, identifying sycl as a cytoskeleton-associated protein. These sycl-fibrils or sycl-decorated fibrils ran criss-cross through the cytoplasm, with a perinuclear rather than a subcortical distribution. Little or no membrane staining was observed in these cells, indicating sycl was scarse at this site or undetectable with the current tools. To investigate possible co-localisations and interactions of syndecans and sycl, CHO cells were stably transfected with an expression plasmid coding for human syndecan-1 (chosen because of the availability of a monoclonal antibody for the syndecan-1 cytoplasmic domain) or syndecan-2 were transiently supertransfected with plasmids coding for eGFP (enhanced green fluorescent protein) or an eGFP-sycl fusion-protein. CHO cells overexpressing eGFP-sycl showed a striking change in morphology in comparison to non-tranfected cells and cells expressing eGFP (see FIG. 1 and 2 respectively). These cells were larger and flatter, and were decorated with numerous membrane projections or filopodia. Often, these cells were also multinucleated. The eGFP-sycl protein autofluoresence was localized to the cytoplasm, to small intracellular vesicular structures, to the plasmamembrane, and partially colocalized with F-actin filaments, as revealed by staining with phalloidin-Texas Red-X. Green fluorescent eGFP-sycl transfectants were intensely stained by anti-sycl monoclonal antibody, indicating marked overexpression of sycl protein in these cells. Staining of the syndecan-1 double transfectants with antibody directed towards the cytoplasmic domain of syndecan-1 and Texas Red-X-labeled secondary antibody, showed partial colocalization of this syndecan domain and eGFP-sycl. No colocalization of green and red fluorescence was observed in control cells transfected with eGFP. Capping of the syndecan-1 cytoplasmic domain could be induced by mild and brief trypsinization of the cells at 4xc2x0 C., followed by a 10-30 min reincubation of the cells at 37xc2x0 C. Massive co-capping of the red (syndecan-1 cytoplasmic domain) and green fluorescence (GFP) was observed in eGFP-sycl expressing cells, but not in eGFP expressing cells, indicating colocalization of sycl and this syndecan domain during the capping process. Similar, but less perfect, colocalizations were observed with a syndecan-1 ectodomain antibody in the syndecan-1 transfectants, and with a syndecan-2 ectodomain antibody in the syndecan-2 transfectants. These data show that sycl associates with the cytoskeleton and with the cytoplasmic domains of the syndecans, and function as an adaptor molecule that links syndecan-supported recognition processes to intracellular signal-transduction, control, and -effector systems.
The current invention thus concerns an isolated nucleic acid sequence comprising the nucleotide sequence as provided in SEQ ID NO. 2 coding for a syndecan interacting protein or a functional fragment thereof. xe2x80x9cFunctional fragmentxe2x80x9d in this context means that said fragment to which subject it relates has substantially the same activity as the subject itself, although the form, length or structure may vary.
Furthermore a recombinant expression vector comprising said isolated nucleic acid sequence (in sense or anti-sense orientation) operably linked to a suitable control sequence belongs to the present invention and cells transfected or transduced with said recombinant expression vector also belong to the scope of the invention.
The current invention is not limited to the exact isolated nucleic acid sequence comprising the nucleotide sequence as mentioned in SEQ ID NO. 2 but also a nucleic acid sequence hybridizing to said nucleotide sequence as provided in SEQ ID NO. 2 or a functional part thereof. The present invention also encodes for a syndecan interacting protein or a functional fragment thereof.
As used herein, hybridization means is meant conventional hybridization conditions known to the skilled person, preferably appropriate stringent hybridization conditions. Hybridization techniques for determining the complementarity of nucleic acid sequences are known in the art. The stringency of hybridization is determined by a number of factors during hybridization, including temperature, ionic strength length of time and composition of the hybridization buffer. These factors are outlined in, for example, Maniatis et al. (1982) Molecular Cloning; A laboratory manual (Cold Spring Harbor Press, Cold Spring Harbor, N.Y.)
Another aspect of the invention is a polypeptide comprising the amino acid sequence according to SEQ.ID.NO. 1 or a functional fragment thereof. Specifically the polypeptide fragments with the amino acid sequence located between position 92 and 298 and more specifically between 113 and 298 in the current invention.
The scope of the present invention also includes variants or homologues of amino acids enclosed in the polypeptide wherein said amino acids are substituted by other amino acids obvious for a person skilled in the art without loosing their activity. The polypeptide or functional fragments of the present invention are not necessarily translated from the nucleic acid sequence according to the invention but may be generated in any manner, including for example, chemical synthesis, or expression of a recombinant expression system. A pharmaceutical composition comprising above mentioned nucleic acid(s) or a pharmaceutical composition comprising said polypeptide(s) also belong to the current invention. The nucleic acid and/or polypeptide according to the invention can be optionally used for appropriate gene therapy purposes. In addition a method for diagnosing, prognosis and/or follow-up of a disease by using the nucleic acid(s) according to the invention or by using the polypeptide(s) of the current invention.
A method of screening for components which affect the interaction between syndecan and syndecan interacting protein can be developed having the current knowledge of these syndecan interacting factors according to the invention. A diagnostic kit comprising the nucleic acid(s) and/or polypeptide(s) according to the invention for performing above mentioned method for diagnosing, prognosis and/or follow-up of a disease clearly belong to the invention as well. Some diseases in this respect are for instance Alzheimer disease or inflammatory diseases. Screening may also occur for cell malignancies and the activity of cytostatica thereupon.
A transgenic animal harbouring the nucleic acid(s) according to the invention in its genome also belong to the scope of this invention. With transgenic animal is meant a non-human animal which have incorporated a foreign gene (called transgene) into their genome; because this gene is present in germ line tissues, it is passed from parent to offspring establishing lines of transgenic animals from a first founder animal. It will be appreciated that when a nucleic acid construct is introduced into an animal to make it transgenic the nucleic acid may not necessarily remain in the form as introduced. By xe2x80x9coffspringxe2x80x9d is meant any product of the mating of the transgenic animal whether or not with another transgenic animal, provided that the offspring carries the transgene. In order to clarify what is meant in this description by some terms a further explanation is hereunder given.
The polypeptides of the present invention are not necessarily translated from a designated nucleic acid sequence; the polypeptides may be generated in any manner, including for example, chemical synthesis, or expression of a recombinant expression system, or isolation from a suitable viral system. The polypeptides may include one or more analogs of amino acids, phosphorylated amino acids or unnatural amino acids. Methods of inserting analogs of amino acids into a sequence are known in the art. The polypeptides may also include one or more labels, which are known to those skilled in the art.
The terms xe2x80x9cgene(s),xe2x80x9d xe2x80x9cpolynucleotide,xe2x80x9d xe2x80x9cnucleic acid sequence,xe2x80x9d xe2x80x9cnucleotide sequence,xe2x80x9d xe2x80x9cDNA sequencexe2x80x9d or xe2x80x9cnucleic acid molecule(s)xe2x80x9d as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, this term includes double- and single-stranded DNA, and RNA. It also includes known types of modifications, for example, methylation, xe2x80x9ccapsxe2x80x9d substitution of one or more of the naturally occurring nucleotides with an analog.
An xe2x80x9cexpression vectorxe2x80x9d is a construct that can be used to transform a selected host cell and provides for expression of a coding sequence in the selected host.
A xe2x80x9ccoding sequencexe2x80x9d is a nucleotide sequence which is transcribed into MRNA and/or translated into a polypeptide when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a translation start codon at the 5xe2x80x2-terminus and a translation stop codon at the 3xe2x80x2-terminus. A coding sequence can include, but is not limited to mRNA, cDNA, recombinant nucleotide sequences or genomic DNA, while introns may be present as well under certain circumstances.
xe2x80x9cControl sequencexe2x80x9d refers to regulatory DNA sequences which are necessary to effect the expression of coding sequences to which they are ligated. The nature of such control sequences differs depending upon the host organism. In prokaryotes, control sequences generally include promoter, ribosomal binding site, and terminators. In eukaryotes generally control sequences include promoters, terminators and, in some instances, enhancers, transactivators or transcription factors. The term xe2x80x9ccontrol sequencexe2x80x9d is intended to include, at a minimum, all components the presence of which are necessary for expression, and may also include additional advantageous components.
xe2x80x9cOperably linkedxe2x80x9d refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. A control sequence xe2x80x9coperably linkedxe2x80x9d to a coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequences. In case the control sequence is a promoter, it is obvious for a skilled person that double-stranded nucleic acid is used.
The terms xe2x80x9cproteinxe2x80x9d and xe2x80x9cpolypeptidexe2x80x9d used in this application are interchangeable. xe2x80x9cPolypeptidexe2x80x9d refers to a polymer of amino acids (amino acid sequence) and does not refer to a specific length of the molecule unless otherwise specified in the description. Thus peptides and oligopeptides are included within the definition of polypeptide. This term does also refer to or include post-translational modifications of the polypeptide, for example, glycosylations, acetylations, phosphorylations and the like. Included within the definition are, for example, polypeptides containing one or more analogs of an amino acid (including, for example, unnatural amino acids, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring. xe2x80x9cFragment of a sequencexe2x80x9d or xe2x80x9cpart of a sequencexe2x80x9d means a truncated sequence of the original sequence referred to. The truncated sequence (nucleic acid or protein sequence) can vary widely in length; the minimum size being a sequence of sufficient size to provide a sequence with at least a comparable function and/or activity of the original sequence referred to, while the maximum size is not critical. In some applications, the maximum size usually is not substantially greater than that required to provide the desired activity and/or function(s) of the original sequence.
The term xe2x80x9cantibodyxe2x80x9d includes, without limitation, chimeric antibodies, altered antibodies, univalent antibodies, bi-specific antibodies, Fab proteins or single-domain antibodies. In many cases, the binding phenomena of antibodies to antigens is equivalent to other ligand/anti-ligand binding. The antibody can be a monoclonal or a polyclonal antibody.
xe2x80x9cSense strandxe2x80x9d refers to the strand of a double-stranded DNA molecule that is homologous to a MRNA transcript thereof. The xe2x80x9canti-sense strandxe2x80x9d contains an inverted sequence which is complementary to that of the xe2x80x9csense strandxe2x80x9d.
xe2x80x9cExpressionxe2x80x9d means the production of a protein or nucleotide sequence in the cell itself or in a cell-free system. It includes transcription into an RNA product, post-transcriptional modification and/or translation to a protein product or polypeptide from a DNA encoding that product, as well as possible post-translational modifications.
xe2x80x9cForeignxe2x80x9d with regard to a DNA sequence means that such a DNA is not in the same genomic environment in a cell, transformed with such a DNA in accordance with this invention, as is such DNA when it is naturally found in a cell from which such a DNA originates.
In the description of the current invention reference is made to the following sequences of the Sequence Listing and Figures.
SEQ ID NO. 1: amino acid sequence (position 1-298) for syndecan interacting protein (one letter code)
SEQ ID NO. 2: cDNA sequence (position 1-2193) encoding for a polypeptide of 298 amino acids ; ATG start codon at position 149
SEQ ID NO. 3: cDNA sequence together with amino acid sequence encoding syndecan interacting protein