The present invention relates to a method for isolating and Identifying the nucleotide sequence of the human gene for the type IV collagen a5(IV) chain. The present invention is directed to the determination of the nucleotide sequence of the gene for the a5(IV) collagen chain in individuals by any method known to the art e.g. cloning from genomic DNA libraries or amplifying gene regions with the polymerase chain reaction (PCR) and studying their physical properties or nucleotide sequences. In addition, the invention is directed to the use of the nucleotide sequences of the a5(IV) gene to amplify or identify the nucleotide sequences of the a5(IV) gene.
Basement membranes (BM) are special extracellular, sheet-like structures that separate cells of organs from the underlying connective tissues. They form flexible boundaries that provide physical support and biological signals required for maintainance of morphology and orderly development of distinct tissue patterns. The BM protein components can have different subunits and molecular compositions that possess the necessary functional elements for the tissues concerned. This has become more apparent as new chains with restricted tissue distributions have been found e.g. for type IV collagen and laminin. The basement membranes have also an Important role in the correct regeneration of tissues following injuries such as during post-wound reformation of skin and nerves. Basement membranes also function as macromolecular filters e.g. in kidneys where the glomerular basement membrane is the sole filtration barrier between the capillary lumen and the urinary space, hindering the blood to urine leakage of macromolecules and blood cells.
Basement membranes are composed of several specific components that include type IV collagen, laminin, entactin (nidogen) and proteoglycans. Type IV collagen is the major structural component of basement membranes and it forms the framework of these extracellular structures. In addition, basement membranes contain SPARK (BM-40), fibronectin and type VII collagen that are also present in other extracellular structures. The exact molecular compositions of basement membranes in different tissues Is not well known but there Is growing evidence that even the ubiquitous basement membrane components as type IV collagen and laminin have different chain compositions in different tissues. Additionally, there are some proteins such as pemphigoid antigen that are present only In the basement membranes of skin.
Type IV collagen is the major structural component of basement membranes and It can provide up to 60 % of the structure. As all collagens, the type IV collagen molecule is formed by three a chains coiled around each other to form the collagen triple helix with the repeated Gly-X-Y-triplet amino acid sequence containing regions. The molecule has a triple-helical 400 nm-long collagenous part and a C-terminal globule with a diameter of about 15 nm. The collagenous domain sequence has several interruptions in the otherwise continuous collagenous Gly-X-Y repeat sequence that give flexibility to the type IV collagen molecules as opposed to the rigid rod-like molecules of fibrillar collagens with uninterrupted helices. The triple-helical type IV collagen molecules can form dimers by the attachment of two NC domains and tetramers by the 30 nm overlapping cross-linking of four molecules of their amino terminal ends (Timpl, Eur. J. Biochem. 180: 487-502, 1989).
The major form of the molecules consists of two xcex11(IV) and one xcex12(IV) chain. The applicants have determined the entire amino acid sequence of these two chains from man by cloning and sequencing cDNA clones covering the coding region (Soininen et al., FEBS. Lett., 225: 188-194, 1987; Hostikka and Tryggvason, J. Biol. Chem., 263: 19488-19493, 1988). The results showed that the xcex11(IV) chain is synthesized as a 1969 amino acid residue polypeptide as compared to 1712 residues in the xcex12(IV) chain. The carboxyl terminal NC domains of the two chains are very similar with 63 % identical amino acid residues. The sequence identity of the two chains is much less conserved in the triple-helical region with only 49 % identity; where only 22 % of the X and Y residues in the collagenous Gly-X-Y-repeat sequence are conserved. Two other distinct type IV collagen xcex1 chains, referred to as xcex13(IV) and xcex14(IV), have been described (Butkowski et al., J. Biol. Chem., 262: 7874-7877, 1987; Saus et al. J. Biol. Chem. 263: 13374-13380, 1988; Gunwar et al., J. Biol. Chem., 265: 5466-5469, 1990).
Of importance with respect to the present invention is our recent discovery of yet another novel human type IV collagen xcex15(IV) chain by cDNA cloning (Hostikka et al., Proc. Natl. Acad. Sci. USA. 87: 1606-1610, 1990 and the parent U.S. patent application xe2x80x9cMethod for determining the nucleotide sequence of a novel xcex15(IV) chain of human type IV collagenxe2x80x9d Ser. No. 377,238, filed on Jul.7, 1989 now U.S. Pat. No. 5,114,840. Amino acid sequence comparison with the xcex11(IV) and xcex12(IV) chains and the data available of the xcex13(IV) and xcex14(IV) chains demonstrated that the xcex15(IV) chain is a distinct gene product which is closely related to the xcex11(IV) chain. In the NC domain the identity between the deduced amino acid sequences is 83 % with the xcex11(IV) chain and with xcex12(IV) chain 63 %; whereas in the collagenous domain the identities are 58 % and 46 %, respectively. Furthermore, all the interruptions in the collagenous Gly-X-Y-repeat sequence of the xcex15(IV) chain coincide with those in the xcex11(IV) chain but only partially with those In the xcex12(IV) chain (Hostikka et al., Proc. Natl. Acad. Sci. USA. 87: 1606-1610, 1990 and the parent U.S. patent application xe2x80x9cMethod for determining the nucleotide sequence of a novel xcex15(IV) chain of human type IV collagenxe2x80x9d Ser. No. 377,238,filed on Jul. 7, 1989, now U.S. Pat. No. 5,144,840.
With xcex15(IV)-specific peptide-antibodies, the chain was shown to be almost exclusively present in the GBM in the kidney (Hostikka et al., Proc. Natl. Acad. Sci. USA. 87: 1606-1610, 1990 and the parent U.S. patent application xe2x80x9cMethod for determining the nucleotide sequence of a novel xcex15(IV) chain of human type IV collagenxe2x80x9dSer. No. 377,238, filed on Jul.7, 1989, now U.S. Pat. No. 5,144,840 and the continuation-in-part application xe2x80x9cImmunological methods for the detection of the human type IV collagen xcex15 chainxe2x80x9d, filed Dec. 20, 1990, U.S. application Ser. No. 630,563 whereas the well characterized xcex11(IV) and xcex12(IV) chain are believed to be ubiquitous basement membrane (BM) components present in all BMs.
Using cDNA probes and both somatic cell-hybrids and in situ hybridization, the gene for the human type IV collagen xcex15 chain COL4A5 was localized to the q22 region on the long arm of chromosome X (Hostikka et al., Proc. Natl. Acad. Sci. USA. 87: 1606-1610, 1990 and the parent U.S. patent application xe2x80x9cA Method for determining the nucleotide sequence of a novel xcex15(IV) chain of human type IV collagenxe2x80x9d serial number 377,238, filed on Jul. 7, 1989, U.S. Pat. No. 5,144,840. This is different from the human genes COL4A1 and COL4A2 coding for the xcex11(IV) and xcex12(IV) chains that both are located on the terminal end of the long arm of the chromosome 13 (Boyd et al., Hum. Genet., 74: 121-125, 1986; Griffin et al., Proc. NatI. Acad. Sci. USA., 84: 512-516, 1987). The xcex11(IV) and xcex12(IV) chains are transcribed by different DNA strands from a common bidirectional promoter in opposite directions, so that the transcription initiation sites are separated by only 42-127 bp (Pxc3x6schl et al., EMBO J., 7: 2687-2695, 1988; Soininen et al., J. Biol. Chem., 263: 17217-17220, 1988).
The applicants have determined the complete structure of the human xcex11(IV) gene (Soininen et al., J. Biol. Chem.,264: 13565-13571, 1989) and the partial structure for the human xcex12(IV) gene (Hostikka and Tryggvason, FEBS Lett., 224: 297-305). The xcex11(IV) gene contains 52 exons spread over at least 100 kb of genomic DNA. The sizes of translated exons vary from 27 to 213 bp. The collagenous domain is encoded by 47 exons with sizes varying between 27 and 192 bp. About half of them begin with complete codons whereas the other half of the gene has mainly split codons, usually beginning with the second base for glycine, but also two exons begin with the third base of a codon. Thus, the exon size pattern of this gene is very different from the highly conserved structure of the genes for the fibrillar collagens. The largest exon coding for a translated sequence is the junction exon coding for the carboxyl-terminal part of the collagenous domain and the amino-terminal part of the NC domain. Four more exons code for the NC domain, the last of them containing the 3 untranslated region (Soininen et al., J. Biol. Chem., 264: 13565-13571, 1989).
The region characterized from the human xcex12(IV) gene shows a different pattern. The NC domain is encoded by three exons as compared to five in the xcex11(IV) gene. The similarity of the two genes is demonstrated by the fact that although the exon sizes differ, the locations of the introns are exactly the same when comparing to the aligned amino acid sequences of the chains. On the other hand, the exons in the collagenous domain coding region of the xcex12(IV) gene are different so that only one intron location seems to coincide, whereas all the exon sizes differ nor do they obey the fibrillar 54 bp pattern.
The exons in the collagenous region coding part begin with split glycine codons (Hostikka and Tryggvason, FEBS Lett., 224: 297-305).
Due to the wide distribution of basement membranes in the body, they are frequently affected in local and systemic diseases, and in many instances the consequent pathological changes lead to severe clinical complications. These diseases may be acquired i.e. complications of a disease that do not primarily involve basement membrane, or they can be genetically determined inherited diseases that are due to gene mutations leading to abnormal structure and function of the basement membrane. The best known example of an acquired disease is diabetes mellitus where the basement membrane structure is affected in almost all tissues in the body, resulting in dysfunction of small blood vessels (microangiopathy), kidneys (nephropathy), and nerves (neuropathy). The biochemical alterations leading to these malfunctions are still poorly understood. Also, autoimmune diseases, such as Goodpasture syndrome, affect the basement membranes. The antibody binds to the glomerular basement membrane and triggers its destruction by complement binding and phagocytosis.
Examples of inherited diseases are: (1) the congenital nephrotic syndrome that is characterized by extensive leakage of blood proteins through the renal glomerular basement membrane into the urine (proteinuria); and (2) the Alport syndrome where malfunction of the glomerular basement membrane leads to the passage of blood cells into urine (hematuria), eye lesions and hearing loss. The actual gene defect leading to the congenital nephrotic syndrome is yet completely unknown. The Alport syndrome is primarily an X-linked inherited kidney disease that has been linked by chromosomal markers to chromosome X region q22-26 (Atkin et al., Am. J. Hum. Genet., 42: 249-255, 188; Flinter et al., Genomics 4: 335-338, 1989) It leads to malfunction of kidneys and it can be treated only by dialysis or renal transplantation. The disease has been shown to be associated with progressive ultrastructural abnormalities, such as patchy thickening and thinning of the glomerular basement membrane and splitting of the lamina densa. These results and the more recent immunological studies have suggested that the cause of the Alport syndrome would be an abnormal or absent type IV collagen xcex1 chain (Spear, Clin. Nephrol. 1: 336-337, 1973; Kashtan et al., J. Clin. Invest., 78: 1035-1044, 1986).
Of importance with respect to the present invention is our recent discovery that a mutation in the type IV collagen xcex15(IV) gene, that changes the structure of the produced polypeptide chain, causes Alport syndrome. Eighteen Alport kindreds were studied and in three of them an abnormal fragment pattern was shown with restriction fragment length polymorphism (RFLP) analysis for the xcex15(IV) gene (Barker et al., Science 248: 1224-1227, 1990; and the U.S. patent application xe2x80x9cMethod for detection of Alport syndromexe2x80x9d, Ser. No. 07/534,786, filed on Jun. 7, 1990 now abandoned, and the parent U.S. patent application xe2x80x9cMethod for determining the nucleotide sequence of a novel xcex15(IV) chain of human type IV collagenxe2x80x9d Ser. No.377,238, filed on Jul.7, 1989, now U.S. Pat. No. 5,144,840. In kindred EP, there was a deletion of about 15 kb of the gene, containing exons 5 through 10 as counted from the 3xe2x80x2 end. In kindred P, there was a point mutation that changed a codon for a conserved cysteine residue to a codon for serine and created restriction sites for Psil and Bgll restriction endonucleases (Barker et al., Science 248: 1224-1227, 1990; the U.S. patent application xe2x80x9cMethod for detection of Alport syndromexe2x80x9d, Ser. No. 07/534,786, filed on Jun. 7, 1990 now abandoned, and the parent U.S. patent application xe2x80x9cA Method for determining the nucleotide sequence of a novel xcex15(IV) chain of human type IV collagenxe2x80x9d Ser. No. 377,238, filed on Jul.7, 1989 now U.S. Pat. No. 5,144,840 and Zhou, et al., 1991a, Genomics, in press). Later studies have shown that in about 10 % of the Alport patients, a gene rearrangement can be observed with the xcex15(IV) cDNA clones in RFLP analysis.
The present invention provides for a method for isolation and partially characterizing the nucleotide sequence of the gene coding for the human type IV collagen xcex15(IV) chain. The invention provides for the use of the identified nucleotide sequence (or DNA fragments thereof) to detect mutation(s) in Individual genes specific for the xcex15(IV) chain which can, directly or indirectly, produce human diseases. The invention relates to the use of noncoding intervening sequences (introns) between and flanking the xcex15(IV) polypeptide chain coding regions (exons) of the xcex15(IV) gene to amplify and determine the physical properties and nucleotide sequence of the said gene for both protein coding regions and noncoding regions. Also, the Invention relates to the use of gene fragments generated through amplification from human genomic or cloned DNA for the detection and analysis of the gene, such as in detection of mutations. Additionally, the invention provides for the use of the Identified recombinant DNA cloning vectors and transformed hosts which contain a vector which has a fragment of the xcex15(iV) gene inserted into.
The invention further related to the detection of variations of an individual""s COL4A5 gene in comparison with the known normal COL4A5 gene. In one embodyment this is done by RFLP analysis, wherein an individual""s genomic DNA is cut by restriction enzyme, size-fractionated and tested for the sizes of DNA fragments of the xcex15(IV) gene. The invention relates to nucleotide sequences flanking the polypeptide chain coding region (intron sequences) that can be used to amplify coding regions (exons) with the flanking consensus sequences needed for the proper splicing of the pre-mRNA to mRNA by cloning or the method of polymerase chain reaction (PCR) or any other method known to the art; and to determining the differences of those sequences with the normal COL4A5 gene by various techniques, such as single-strand conformation polymorphism analysis, denaturing gradient gel electrophoresis, S1 nuclease mapping nucleotide sequencing or any other method known to the art.
Finally, the invention related to the fragments of the normal human xcex15(IV) gene that may by used to correct a defective gene by gene therapy.