The present invention relates to novel nucleic acid sequences which code for polypeptides belonging to a group named amelins, which polypeptide sequences comprise tetrapeptide domains implicated in cell surface recognition. Possible applications of the amelin sequence concern the diagnosis of disorders of hard tissue formation, and the production of the amelin protein or fragments thereof, which may then serve as matrix constituents or cell recognition tags in the formation of biomaterials. The invention also relates to expression vectors containing the nucleic acid sequences according to the invention for production of the protein, organisms containing said expression vector, methods for producing the polypeptide, compositions comprising the polypeptides, and methods for treating various hard tissue diseases or disorders.
In bone, dentin and other tissues, collagen type I or similar proteins assemble into a fibrillar matrix, which in some instances serves as a scaffold for the incorporation of mineral crystals. The adjacent cells establish specific contacts to the matrix, which are mediated by interactions between domains in extracellular proteins such as collagen and receptors of the cell surface, for instance integrins. Peptide domains which are involved in these contacts have been identified in several extracellular proteins (Yamada and Kleinman, 1992). In enamel, a structural network which is comparable to the collagen fibres of bone, cartilage and dentin has not been found. Also, no sequence segments have been identified in the enamel matrix proteins, which could mediate its anchoring to cell adhesion molecules. The enamel proteins amelogenin and enamelin do not contain such protein domains. The mineral content of newly deposited enamel is around 15% of the total mass and increases later, under degradation of the proteins, to 95% (Robinson et al., 1988).
Two predominant groups of proteins have been identified in enamel: enameling and amelogenins (Termine et al., 1980). Protein fragments in mature enamel are similar to one of the enameling, tuftelin, which has been located by antibodies in-between the enamel prisms. The cDNA sequence corresponding to tuftelin has been determined, and it has been speculated that this protein might have a function in the mineralization of enamel (Deutsch et al., 1991). The significance of the remaining, so far described, enameling for enamel formation may be disputed, because the main protein species are identical to proteins from the bloodstream (Strawich and Glimcher, 1990). It is still discussed whether amelogenin, the most frequent enamel protein, provides a scaffold for the enamel matrix (Simmer et al., 1994).
Partial sequences of randomly selected cDNA clones from a rat in situ library have previously been compiled (Matsuki et al., 1995), of which some show homology to sequences of the invention. No reading frame was suggested from the partial sequences. It was not stated if polypeptides are encoded by these sequences and no suggestion as to possible function of such polypeptides were given.
Non-amelogenin proteins have been identified in porcine immature enamel (Uchida et al., 1995). A 15 kDa protein had an N-terminal amino acid sequence (Val-Pro-Ala-Phe-Pro-Arg-Gln-Pro-Gly-Thr-His-Gly-Val-Ala-Ser-Leu (SEQ ID NO:7)) with no homology to previously known enamel proteins. It was proposed that the non-amelogenins comprise a new family of enamel proteins but their function was not suggested. The proteins have not been sequenced completely and their genes are not known.
WO89/08441 relates to a composition for use in inducing binding between parts of living mineralized tissue in which the active constituent originates from a precursor to dental enamel, so called enamel matrix. The composition induces binding by facilitating regeneration of mineralized tissue. The active constituent is part of a protein fraction and is characterized by having a molecular weight of up to about 40.000 kDa but no single protein is identified.
Although proteins of mineralized matrices are often produced in high amounts, their poor solubility prevents a direct analysis. In the tooth enamel, a physiological degradation of matrix proteins occurs in the course of mineral acquisition during the maturation phase and constitutes an additional difficulty for the analysis of the matrix proteins. The present invention is based upon the consideration that since the matrix forming cells synthesize the corresponding proteins in high amounts, they should contain a high copy number of the mRNAs. Accordingly, sequence analysis of the predominant mRNA species of the matrix forming cells may circumvent part of the problems and help to investigate certain protein constituents of the matrix.
These considerations initiated the approach taken which led to the discovery of the new amelin mRNA sequences, the basis for the present invention. Briefly, a genetic library was constructed containing sequences of the mRNA species of developing teeth. Individual sequences were obtained from single bacterial clones and used for in situ hybridization experiments of histological sections through developing teeth. Sequences which were detected in cells forming hard tissue matrix, e.g. ameloblasts, were determined and used to query sequence databases. Most of the thus selected sequences were represented in the databases but two sequences now termed the amelin sequences were not. These two variants of a new mRNA sequence are expressed at high levels in rat ameloblasts during the formation of the enamel matrix. The sequences contain open reading frames for 407 and 324 amino acid residues, respectively. The encoded proteins, which were named amelins, are rich in proline, leucine and glycine residues and contain the peptide domain Asp-Gly-Glu-Ala, an integrin recognition sequence, in combination with other domains interacting with cell surfaces. The sequences coding for the C-terminal 305 amino acid residues, i.e. amino acids 102-407 in SEQ ID NO:2 and amino acids 19-324 in SEQ ID NO:4, the 3xe2x80x2 non-translated part and a microsatellite repeat at the non-translated 5xe2x80x2 region are identical in both mRNA variants. The remaining 5xe2x80x2 regions contain 338 nucleotides unique to the long variant (nucleotides 13-350 in SEQ ID NO:1), 54 common nucleotides and 46 nucleotides present only in the short variant (nucleotides 66-111 in SEQ ID NO:3). Fourteen nucleotides have the potential to code for 5 amino acids of both proteins in different reading frames (nucleotides 391-404 in SEQ ID NO:1 and 52-65 in SEQ ID NO:3). The reading frame of the longer variant includes codons for a typical N-terminal signal peptide. The properties of the amelin mRNA sequences indicate that amelin is a component of the enamel matrix and the only proteins which have so far been implicated in binding interactions between the ameloblast surface and its extracellular matrix.
It is contemplated that the amelin peptides or parts thereof may be synthesized, either chemically or by translation with the help of expression vectors, by using the sequence information described herein. It is further contemplated that these peptides may contribute to the design of medical devices for the repair of teeth or bones. The peptides may also be combined with artificial implant material for the purpose of improving the biocompatibility of the material. Human amelin mRNA or gene sequences may help in the diagnosis of genetically inherited disorders in hard tissue formation.