The collagen genes are an important family of genes, the products of which provide the extracellular framework for virtually all multicellular organisms (Bornstein et al., 1980 Ann. Rev. Biochem. 49:957-1003). More than nineteen distinct types of collagen have been described (Ramirez et al., 1985, Ann. New York Acad. Sci. 460:117-129; Vuorio et al., 1990, Annu. Rev. Biochem. 59:837-872; Chu et al., 1993, In: Connective Tissue and Its Heritable Disorders, Royce et al., eds., Wiley-Liss, New York, pp.149-165; Prockop et al., 1995, Annu. Rev. Biochem. 64:403-434). The biosynthesis of collagen has been described (Prockop et al., 1979, N. Eng. J. Med. 301:13-23).
Large collagen structures form by nucleated growth of collagen chains into triple helical collagen subunits. Collagen fibrils form by nucleated growth of collagen subunits, a fibril comprising a quarter-staggered array of subunits (Gross et al., 1958, Annu. Rev. Cell Biol. 2:421-457; Wood et al., 1960, Biochem. J. 75:588:598; Prockop et al., 1984, N. Eng. J. Med. 311:376-386; Kadler et al., 1987, J. Biol. Chem. 262:15696-15701; Na et al., 1989, Biochem. 28: 7153-7161; Kadler et al., 1990, Biochem. J. 268:339-343; Prockop et al., 1989, Biophysics (Eng. Transl. Biofizika) 3:81-89). During nucleated growth, the collagen protein chains fold into the triple helical conformation that is a unique and characteristic feature of all collagens (Engel, 1987, Adv. Meat. Res. 4:145-158; Engel et al., 1991, Annu. Rev. Biophys. Biophys. Chem. 20:137-152; Piez, 1984, In: Extracellular Matrix Biochemistry, Piez et al., eds., Elsevier Science Pub. Co. Inc., New York, pp. 1-40).
Each of the three .alpha.chains in a collagen subunit comprises a repeating tripeptide sequence having the general amino acid sequence Gly-X-Y. The presence of glycine, the smallest amino acid, in every third position is critical, since the amino acid in this position fits into a restricted space in which the three chains come together in the center of the triple helix. The X-and Y-amino acid residues are frequently proline and 4-hydroxyproline, respectively. Because the highly flexible glycine bonds flank the relatively inflexible peptide bonds of proline and 4-hydroxyproline (Hyp), individual .alpha. chains do not independently fold into any defined three-dimensional structure. Instead, the chains fold into a defined structure only by forming hydrogen bonds and water bridges that link the Gly-X-Y sequences in one .alpha. chain to equivalent Gly-X-Y sequences in the two other .alpha. chains.
It is essential to proper collagen molecule conformation that the three a chains are in register, in the sense that the Gly-X-Y tripeptide units in one chain are hydrogen-bonded to the corresponding tripeptide units in the other two .alpha. chains. Otherwise, the chains have exposed ends or internal loops of non-triple-helical tripeptide units. Substitution of one or more amino acids of the Gly-X-Y tripeptide sequence with other amino acids, particularly substitution of Gly with an amino acid having a relatively bulky side chain, can produce a structurally abnormal but partially functional collagen subunit. A large number of mutations comprising said substitutions have been described (e.g. Kuivaniemi et al., 1991, FASEB J. 5:2052-2060). Numerous diseases and disorders are associated with mutations in one or more of the Type I or Type IX collagen genes including, but not limited to, osteoporosis, osteoarthritis, chondrodysplasia, multiple epiphyseal dysplasia, osteogenesis imperfecta, shortness of stature, scoliosis, low bone density, and degenerative joint disease.
Type 1 Collagen
Type I collagen accounts for about 80 to 90% of the protein found in bone. It is also found in large amounts in tissues such as skin, ligaments, and tendons. In many tissues, the Type I collagen fibrils are associated with other types of collagen and with other components of the extracellular matrix.
Type I collagen is synthesized as a precursor denoted Type I procollagen, which comprises two pro.alpha.1(I) chains and one pro.alpha.2(I) chain. Each pro.alpha. chain comprises three separate domains, namely an N-propeptide domain, a central domain, and a C-propeptide domain.
The N-propeptide domain located at the amino-terminal end of each pro.alpha. chain comprises a globular subdomain, a short triple-helical subdomain, and another short subdomain that forms part of the cleavage site at which the N-propeptide is separated from the mature collagen molecule.
The central domain of each pro.alpha. chain is denoted the .alpha.-chain domain, which comprises about several hundred amino acid residues and, with the exception of a short sequence at the end of the domain, every third amino acid is glycine. The .alpha.-chain largely comprises the Gly-X-Y tripeptide repeating unit.
The globular C-propeptide domain located at the carboxyl-terminal end of each pro.alpha. chain is responsible for association of the pro.alpha. chains during biosynthesis of collagen. Hydrophobic and electrostatic interactions among the C-propeptide domains of the three pro.alpha. chains direct inclusion of two pro.alpha.1(I) chains and one pro.alpha.2(I) chain into the procollagen molecule. Formation of interchain disulfide bonds among the pro.alpha. subunits further stabilizes the structure of the procollagen molecule, provides the correct registration of the Gly-X-Y tripeptide units of the three chains, and forms a triple helical nucleus of Gly-X-Y units of the three chains. After formation of the triple helical nucleus, triple helical association of the Gly-X-Y units of the three chains proceeds in a zipper-like fashion from the carboxyl-toward the amino-terminal portions of the three chains.
Biosynthesis of the procollagen molecule involves a large number of post-translational modifications, requiring at least eight procollagen-specific enzymes and several non-specific enzymes. Over a hundred amino acids in each a chain are modified post-translationally. After procollagen is assembled, it is secreted from cells. Extracellularly, the N-propeptide is cleaved from the procollagen molecule by one enzyme and the C-propeptide is cleaved from the procollagen molecule by a second enzyme, yielding an individual mature collagen subunit. The solubility of the collagen subunit is about two thousand times lower than the solubility of the corresponding procollagen subunit. Low collagen solubility drives spontaneous polymerization of collagen subunits into collagen fibrils. Indeed, in vitro assembly of collagen subunits formed by enzymatic cleavage of procollagen subunits has been demonstrated (Prockop et al., 1989, In: Cytoskeletal and Extracellular Proteins, Aebi et al., eds., Springer Series in Biophysics, Vol. 3, pp. 81-89; Kadler et al., 1990, Biochem J. 268:339-343).
Human pro.alpha.1(I) is encoded by the COL1A1 gene, which is located on chromosome 17q21.3-q22, and human pro.alpha.2(I) is encoded by the COL1A2 gene, which is located on chromosome 7q21.3-q22. Oligonucleotide primers useful for amplifying and sequencing cDNA encoding the human pro.alpha.1(I) chain of Type I procollagen have been described (Labhard et al., 1990, Matrix 10: 124-130).
The complete cDNA sequence corresponding to the COL1A1 gene has been reported (Chu et al., 1984, Nature 310:337-340; Tromp et al., 1988, Biochem. J. 253:919-922; Bernard et al., 1983, Biochem. 22:5213-5223). Furthermore, the nucleotide sequence of approximately 400 base pairs of the 5'-untranslated region, introns 1-26, and twenty-six nucleotides at the 5'-end of intron 27 of COL1A2 have been reported (Chu et al., 1985, J. Biol. Chem. 260:2315-2320; D'Alessio et al., 1988, Gene 67:105-113; Barsh et al., 1985 Proc. Natl. Acad. Sci. USA 82:2870-2874).
The complete cDNA sequence corresponding to the COL1A2 gene has been reported (Bernard et al., 1983, Biochem. 22:1139-1145; de Wet et al., 1987, J. Biol. Chem. 262:16032-16036; Kuivaniemi et al., 1988, Biochem. J. 252:633-640). Furthermore, the nucleotide sequence of certain non-coding regions of the COL1A2 gene have been reported, including the following sequences:
(i) 75 nucleotides located within intron 1 PA0 (ii) 318 nucleotides at the 3'-end of intron 5 PA0 (iii) 298 nucleotides at the 5'-end of intron 6 PA0 (iv) 30 nucleotides at the 3'-end of intron 26 PA0 (v) intron 27 PA0 (vi) intron 28 PA0 (vii) 25 nucleotides at the 5'-end of intron 29 PA0 (viii) intron 33 PA0 (i) obtaining from the subject a sample nucleic acid comprising at least a portion of the gene, wherein the portion comprises at least one intronic nucleotide, a first site, and a second site; PA0 (ii) determining the nucleotide sequence of the portion of the gene; and PA0 (iii) comparing the nucleotide sequence of the portion of the gene with a consensus nucleotide sequence of the gene. A difference between the nucleotide sequence and the consensus nucleotide sequence is indicative of the presence in the subject of the alteration in the gene. The portion of the gene is selected from the group consisting of the segment of the COL1A1 gene extending in the 5'- to 3'-direction from and including the 78 nucleotides of intron 27 located adjacent exon 28 through the 3'-end of the COL1A1 gene, the segment of the COL1A2 gene extending in the 5'- to the 3'-direction from and including the 3'-end thereof through intron 4, the segment of the COL1A2 gene extending in the 5'- to the 3'-direction from and including the 2600 nucleotides at the 3'-end of intron 26 through the 340 nucleotides at the 5'-end of intron 26, the segment of the COL1A2 gene extending in the 5'- to the 3'-direction from and including the 775 nucleotides at the 3'-end of intron 29 through intron 32, the segment of the COL1A2 gene extending in the 5'- to the 3'-direction from and including intron 34 through the 5'-end of the COL1A2 gene, the COL9A1 gene, the COL9A2 gene, and the COL9A3 gene. The consensus nucleotide sequence of the COL1A1 gene is SEQ ID NO: 1; the consensus nucleotide sequence of the COL1A2 gene is SEQ ID NO: 2; that of the COL9A1 gene is SEQ ID NO: 3; that of the COL9A2 gene is SEQ ID NO: 4; and that of the COL9A3 gene comprises SEQ ID NO: 5 and SEQ ID NO: 640.
(Myers et al., 1983, J. Biol. Chem. 258:10128-10135; Myers et al., 1984, J. Biol. Chem. 259:12941-12944; Dickson et al., 1984, Proc. Natl. Acad. Sci. USA 81:4524-4528; Tromp et al., 1988, Proc. Natl. Acad. Sci. USA 85:5254-5258; Sherwood et al., 1990, Gene 89:238-244; Vasan et al., 1991, Am. J. Hum. Genet. 48:305-317; Ganguly et al., 1991, J. Biol. Chem. 266:12035-12040).
Alterations in the coding region of either the COL1A1 gene or the COL2A1 gene and gene alterations that decrease expression of either pro.alpha.1(I) or pro.alpha.2(I) have been associated with osteogenesis imperfecta, a genetic disease of children which is characterized by bone brittleness. Many, but not all, children afflicted with osteogenesis imperfecta also exhibit blueness of the sclerae of the eyes, poor dentition, and thin skin. These symptoms are thought to be associated with a decrease in the amount of Type I collagen in the corresponding tissue or with formation of abnormal Type I collagen fibrils therein. Bone brittleness associated with osteogenesis imperfecta is usually apparent early in childhood because the patients develop numerous fractures resulting from relatively minor trauma. It is thought that bone brittleness associated with decreased or abnormal Type I collagen expression can be confused with the symptoms of battered child syndrome. Many patients afflicted with mild osteogenesis imperfecta become fracture-free after the growth spurt associated with puberty, but develop a marked susceptibility to bone fracture later in life.
Alterations in Type I procollagen genes have been found in patients afflicted with some forms of Ehlers-Danlos syndrome (EDS; Weil et al., 1989, EMBO J. 8:1705; Weil et al., 1988, J. Biol. Chem. 263:8561; Weil et al., 1989, J. Biol. Chem. 264:16804; Vasan et al., 1991, Am. J. Hum. Genet. 48:305; Weil et al., 1990, J Biol. Chem. 265:16007). Some patients afflicted with osteoporosis have alterations in one or more of their Type I procollagen genes (Constantinou et al., 1990, Cytogenet. Cell Genet. 51:979; Nicholls et al., 1984, J. Med. Genet. 21:257-262).
Fibroblasts obtained from a patient afflicted with osteopenia and ankylosing spondylitis synthesized Type I procollagen having decreased thermal stability, an observation which suggests that an altered procollagen protein was involved in the patient's symptoms (Constantinou et al., 1990, Cytogenet. Cell Genet. 51:979). In another case, a structural defect in the pro.alpha.2(I) chain was found in a family afflicted with osteoporosis and idiopathic scoliosis (Shapiro et al., 1989, Connect Tissue Res. 21:117-123). In a third case, a single base mutation that converted the codon encoding glycine-661 of pro.alpha.2(I) to a codon encoding serine was reported in a woman afflicted with postmenopausal osteoporosis (Spotila et al., 1990, Am. J. Hum. Genet. 47: A237). In yet another case, a mutation that converted the codon encoding glycine-19 of pro.alpha.1(I) to a codon encoding cysteine was reported in a patient afflicted with osteoporosis and joint hypermobility (Nicholls et al., 1984, J. Med. Genet. 21:257-262). Furthermore, an eleven-base-pair deletion was detected in the gene encoding pro.alpha.2(I) in another patient afflicted with osteoporosis and joint hypermobility (Nicholls et al., 1984, J. Med. Genet. 21:257-262). Functional and structural abnormalities of Type I procollagen are also known to result in a number of clinically distinct inherited disorders which affect the strength of bone, ligaments, tendons, and other connective tissues (Prockop, 1990 Arth. Rheumat. 31:1-8). The significance of mutations affecting Type I procollagen structure or function no doubt remains unrecognized in numerous diseases and disorders affecting tissues which comprise Type I collagen.
Type IX Collagen
Type IX collagen is a component of hyaline cartilage and the vitreous body of the eye. The Type IX collagen molecule is a heterotrimer comprising three distinct gene products, .alpha.1(IX), .alpha.2(IX) and .alpha.3(IX), which are encoded by the COL9A1 gene, the COL9A2 gene, and the COL9A3 gene, respectively (van der Rest et al., 1987, In Structure and Function of Collagen Types, Mayne et al., eds., Academic Press, Orlando, pp. 195-221; Shaw et al., 1991, Trends Biochem. Sci. 16:191-194). The COL9A1 gene is located on chromosome 6q12-q14, and the COL9A2 gene is located on chromosome 1p32. The chromosomal location of the COL9A3 gene is located on chromosome 20q13.3 (Brewton et al., 1995, Genomics 30:329-336).
Each .alpha. chain comprises three collagenous domains, designated COL1, COL2, and COL3, numbered in the direction from the carboxyl- to the amino-terminus of the chain. The three collagenous domains are flanked by four small non-collagenous domains, designated NC1, NC2, NC3, and NC4 (van der Rest et al., 1988, J. Biol. Chem. 263:1615-1618; Vasios et al., 1988, J. Biol. Chem. 263:2324-2329; Vaughan et al., 1988, J. Cell. Biol. 106:991-997; Ninomiya et al., 1990, In Extracellular Matrix Genes, Sandell et al., eds., Academic press, San Diego, pp. 79-114; Brewton et al., 1995, Genomics 30:329-336).
The 339-amino-acid COL2 domain and the 137-amino-acid COL3 collagenous domain are identical in length in each the three .alpha. chains. The 115-amino-acid COL1 collagenous domains of .alpha.1(IX) and .alpha.2(IX) are nearly identical in length to the COL1 domain of .alpha.3(IX), which is 112 amino acids in length. As a consequence of the similar length of the collagenous regions of the three .alpha. chains, the chains are able to fold into an triple helix like that of the (pro.alpha.1(I)).sub.2 pro.alpha.2(I) triple helix of Type I collagen, as described herein, wherein the Gly-X-Y tripeptide sequences of the three .alpha. chains are in register.
The non-collagenous domains vary in size among the three .alpha. chains of Type IX collagen. The NC3 domain consists of twelve amino acids in the .alpha.1(IX) chain, seventeen amino acids in the .alpha.2(IX) chain, and fifteen amino acids in the .alpha.3(IX) chain. The difference in size among the non-collagenous domains are thought to impart flexibility to the Type IX collagen molecule.
Type IX collagen is attached to the surface of Type II collagen fibers by lysine-derived covalent cross-links between the COL2 domain of .alpha.3(IX) and the C-telopeptide of Type II collagen and between the N-terminal end of the COL2 domains of all three .alpha. chains and the N-telopeptide of Type II collagen (Eyre et al., 1987, FEBS Lett. 220:337-341; van der Rest et al., 1988, J. Biol. Chem. 263:1615-1618; Wu et al., 1992, J. Biol. Chem. 267:23007-23014; Diab et al., 1996, Biochem. J. 314:327-332). Type IX collagen is thus a fibril-associated collagen having interrupted triple helices, and, as such, belongs to the FACIT subgroup of collagens (Gordon et al., 1990, Curr. Op. Cell Biol. 2:833-838).
When a triple-helical domain of Type IX collagen molecule is anchored to a Type II collagen fibril, the NC3 domain functions as a hinge, allowing the COL3 and NC4 domains to project away from the surface of the fibril. Thus, the COL3 and NC4 domains of Type IX collagen are capable of mediating interactions between Type II collagen fibrils in cartilage and non-collagenous proteins (van der Rest et al., 1988, J. Biol. Chem. 263:1615-1618; Vasios et al., 1988, J. Biol. Chem. 263:2324-2329; Vaughan et al., 1988, J. Cell. Biol. 106:991-997). The NC4 domain in the .alpha.1(IX) chain is unique in the sense that it occurs in two variant forms. In cartilaginous tissue, the NC4 domain of the .alpha.1(IX) chain has a longer sequence; in ocular tissue, the NC4 domain has a shorter sequence.
Type IX collagen is a proteoglycan. The NC3 domain of the .alpha.2(IX) chain comprises an attachment site for a glycosaminoglycan side chain (Bruckner et al., 1985, Proc. Natl. Acad. Sci. USA 82:2608-2612). Results from a recent study indicate that the NC1 domain of the three .alpha. chains of Type IX collagen encode all of the information necessary for glycosaminoglycan side chain selection and assembly (Mechling et al., 1996, J. Biol. Chem. 271:13781-13785).
Complete cDNA sequences of the chicken, human, and murine COL9A1 genes have been reported (Ninomiya et al., 1984, Proc. Natl. Acad. Sci. USA 81:3014-3018; Vasios et al., 1988, J. Biol. Chem. 263:2324-2329; Ninomiya et al., 1990, In: Extracellular Matrix Genes, Sandell et al., eds., Academic press, San Diego, pp. 79-114; Muragaki et al., 1990, Eur. J. Biochem. 192:703-708; Rokos et al., 1994, Matrix Biol. 14:1-8). Portions of the genomic structure of the chicken, human, murine, and rat COL9A1 genes have been reported (Lozano et al., 1985, Proc. Natl. Acad. Sci. USA 82:4050-4054; Ninomiya et al., 1990, In: Extracellular Matrix Genes, Sandell et al., eds., Academic press, San Diego, pp. 79-114; Muragaki et al., 1990, Proc. Natl. Acad. Sci. USA 87:2400-2404; Ting et al., 1993, J. Bone Min. Res. 8:1377-1387).
Complete cDNA sequences of the chicken, human, and murine COL9A2 genes have been reported (Ninomiya et al, 1985, Biochem. 24:4223-4229; Perala et al., 1993, FEBS Lett. 319:177-180; Perala et al., 1994, J. Biol. Chem. 269:5064-5071). The complete genomic structure of the chicken and murine COL9A2 genes have been reported (Ninomiya et al., 1990, In Extracellular Matrix Genes, Sandell et al., eds., Academic press, San Diego , pp. 79-114; Perala et al., 1994, J. Biol. Chem. 269:5064-5071).
Complete cDNA sequences of the chicken and human COL9A3 genes have been reported (Brewton et al., 1992, Eur. J. Biochem. 205:443-449; Har-El et al., 1992, J. Biol. Chem. 267:10070-10076; Brewton et al., 1995, Genomics 30:329-336). The genomic structure of the COL9A3 gene has not been reported in any species to date.
Transgenic mice expressing a cDNA construct comprising the coding region of the COL9A1 gene having a large in-frame deletion in the COL2-domain-encoding region thereof develop abnormalities in cartilage collagen fiber structure, and exhibit a phenotype similar to human osteoarthritis and mild chondrodysplasia (Nakata et al., 1993, Proc. Natl. Acad. Sci. USA 90:2870-2874). Degenerative joint disease was also exhibited by transgenic mice which were homozygous for an inactivated COL9A1 gene (Fassler et al., 1994, Proc. Natl. Acad. Sci. USA 91:5070-5074), by transgenic mice which overexpressed the isolated NC4 domain of the .alpha.1(IX) chain (Haimes et al., 1996, Inflam. Res. 44(Suppl.2):S127-S128), and by transgenic mice which expressed a truncated COL9A2 gene having an in-frame deletion of a region which encoded 38 amino acids in the COL2 domain of .alpha.2(IX) (Perala et al., 1994, J. Biol. Chem. 269:5064-5071). These findings indicate that Type IX collagen is not essential for cartilage development, but it is required for maintaining the integrity of cartilage structures.
Until the present invention, it has been possible to identify a mutation associated with a pathological condition in a human COL1 or COL9 gene only if the mutation was located within the coding sequence of one of the COL1A1, COL1A2, COL9A1, COL9A2, and COL9A3 genes, within one of introns 1-26 of the COL1A1 gene, within the 26 nucleotides located at the 5'-end of intron 27 of the COL1A1 gene, or within the approximately 350 nucleotides adjacent exon 1 of the COL1A1 gene, in the 5'-untranslated region thereof. Hence, a serious unmet need exists for methods and compositions which are useful for identifying mutations which are located in non-coding regions of the genes encoding the chains of Type I and Type IX collagen and which are associated with a pathological condition.