1. Field of the Invention
The present invention relates to nucleotide sequences and their use in methods for the detection, diagnosis and therapy of genetically inherited disorders. In particular the nucleotide sequences of the invention may be used for the detection of cystic fibrosis alleles. The invention also relates to nucleotide sequences coding for inherited disease associated genes such as cystic fibrosis and to RNA, such as mRNA, and polypeptides such as proteins, derived therefrom. Diagnostic kits are also provided for use in the diagnostic methods of the present invention.
2. Description of the Related Art
Available methods for the detection of cystic fibrosis are based on linkage studies. In general these comprise the use of labelled probes to detect restriction fragment length polymorphisms in sample genomic DNA. The distinguishing power of the genetic loci detected by the probes mentioned above is limited by the observed degree of polymorphism at such loci. Such probes may therefore identify the same restriction fragment for many individuals and the ability to distinguish between normal and cystic fibrosis alleles in such individuals is no longer possible. A need therefore exists for further and more informative methods of detection and diagnosis.
Cystic fibrosis (CF) is the most common lethal autosomal recessive disease in the Western world with a carrier frequency of .sup..about. 1/20 and an incidence of 1/1600 live births. The disease is extremely rare in African and Asian populations, although cases have been reported in Japan. Affected patients exhibit elevated sodium chloride secretion in sweat and suffer from a variety of symptoms including bronchiectasis, respiratory failure and pancreatic insufficiency. The nature of the defect causing CF is unknown although it has been shown that sweat gland cells and respiratory epithelial cells from affected patients show a diminished permeability to chloride ions and a defective response to beta adrenergic agents (M J Stutts et al, 1985, PNAS, 82, 6677-6681). More recently, it has been demonstrated that the chloride channel can be activated in CF cells and that in CF patients it is the regulation of the chloride channel that is defective (R. A. Frizzell et al, 1986, Science, 233, 558-560, M. J. Welsh and C. M. Liedtke, 1986, Nature, 322, 467-470).
The classical approach to the analysis of genetic disease has relied on a knowledge of the affected protein as in sickle cell anaemia and the thalassaemias. Where the nature of the defective protein is unknown, reverse genetics must be used as exemplified in the analysis of chronic granulomatous disease (B. Royer-Pokora et al, 1986, Nature, 322, 32-38, S. H. Orkin, 1986, Cell, 47, 845-850). In this approach, the chromosomal localisation of the mutant gene is determined by karyotyping or linkage studies. Subsequent cloning and examination of the DNA sequences in the region allows the isolation of candidate genes which can be tested for their involvement in the disease.
Little progress was made in the analysis of CF until 1985 when linkage between CF and the enzyme paraoxonase was reported (Eiberg et al, 1985, Clin. Genet., 28, 275-271). Shortly afterwards, linkage to the probe DOCRI 917 was reported at a distance of 15 centiMorgans (Tsui et al, 1985, Science, 230, 1054-1057) and the probe was shown to map to chromosome 7 by hybridisation to a panel of mouse/human hybrids (Knowlton et al, 1985, Nature, 318, 381-382). Three other RFLP markers for chromosome 7 were found to be linked to CF at a much closer genetic distance of .sup..about. 1 centiMorgan. Two of these markers were derived from the met oncogene locus (Dean et al, 1985, Nature, 318, 385-388, White et al, 1985, Nature, 318, 382-384). The third marker, J3.11, was an anonymous chromosome 7 marker (Wainwright et al, 1985, Nature, 318, 384-385). The discovery of tightly linked markers opened the possibility of DNA based prenatal diagnosis and carrier testing for the disorder, in families with a history of the disease. For this purpose, it was necessary to determine the recombination frequency between met, J3.11 and CF. This was accomplished in a collaborative study of over 200 families. The study confirmed that both met and J3.11 were within 1 cM of the CF gene and gave strong support for the order met-CF-J3.11 (Beaudet et al, 1986, Amer. J. Hum. Genet. 39, 681-693; Lathrop et al, 1988, Amer. J. Hum. Genet., 42, 38-44). However, there is no direct relationship between genetic distance and physical distance. In fact, there are differences in recombination frequency between male and female and there are areas of the genome where the recombination frequency is very much higher than average (Barker et al, 1987, PNAS, 84, 8006-8010).
Additional markers showing linkage to CF have been isolated such as the COL1A2 collagen gene (Scambler et al, 1985, Lancet ii, 1241-1242) and the anonymous probes 7C22 (Scambler et al, 1986, Nucleic Acids Research, 14, 1951-1956) and B79 (Estivill et al, 1986, Hum. Genet., 74, 320-322). While these probes were sometimes useful in prenatal diagnosis, they were too remote from the CF locus to be useful in localising the gene. Systematic screening of a chromosome 7 library resulted in the isolation of a further 63 RFLP markers in linkage to the CF locus (Barker et al, 1987, PNAS, 84, 8006-8010). Twelve of these markers were within 15 cM of the CF locus, but none of them mapped to the interval between met and J3.11. None of these probes have been made publicly available.
In an attempt to isolate markers closer to the CF gene, Collins et al (1987, Science, 235, 1046-1049) constructed a human chromosome jumping library which enabled them to jump from a Not 1 site in the met G gene to a Not 1 site located 100 kb 3' to the starting point, providing a probe CF63. Similar approaches have been described by Michiels et al, 1987, Science, 236, 1305-1308 and Poustka et al, 1987, Nature, 325, 353-355. In this context the J3.11 locus is regarded as lying in the 3' direction from the Met locus as illustrated in FIG. 6 of this application.
A second strategy has been to search for HTF islands. HTF islands (Hpa II Tiny Fragments) are regions of DNA that contain a large number of unmethylated CG dinucleotide pairs including many cleavage sites for rare cutting restriction enzymes. HTF islands are associated with the 5' end of many but not all mammalian gene sequences (Bird, 1986, Nature, 321, 209-213; Lindsay and Bird, 1987, Nature, 327, 336-338). Williamson et al have used chromosome mediated gene transfer to produce a cell line which only contains a section of human chromosome 7 adjacent to the met oncogene (Scambler et al, 1987, Nucleic Acids Research, 14, 7159-7174). A potential disadvantage to this approach is that the activated met oncogene is known to contain sequences from chromosome 1 (Park et al, 1987, Cold Spring Harbor Symposium Quantitative Biology, 51, 967-975). A cosmid library which is not publicly available was prepared from this cell line and a cosmid containing an HTF island was identified (Estivill et al, 1987, Nature, 326, 840-845). Three markers, XV2C, CS7 and KM19 were subcloned from the cosmids and were found by chance to be in strong linkage disequilibrium with CF. The observed linkage disequilibrium was sufficiently strong to allow partial prediction of carrier status from haplotype analysis. For example, 85% of CF chromosomes in Northern Europe possess the ++ haplotype with the KM19 Pst1 polymorphism (Estivill et al, 1987, Genomics, 1, 257-263). The observed haplotype frequencies are different in Southern European populations suggesting that more than one mutation may be be responsible for CF (Estivill et al, 1988, Am. J. Hum. Genet., 43, 23-28). Diagnoses were originally performed by Southern blot analysis but the amplification of the CS7 and KM19 loci by PCR has been described recently (Williams et al, 1988, Lancet ii, 102-103; Feldman et al, 1988, Lancet ii, 102). Only the sequences of the amplification primers were disclosed in these publications. The complete sequence of CS7 has been disclosed in UK Patent Application GB 2 203 742 A and in Wainwright et al, 1987, EMBO J, 7, 1743-1748. Analysis of recombinant families indicates that the gene lies between KM19 and J3.11 (Farrall et al, 1988, Am. J. Hum. Genet., 43, 471-475). Further screening of the cosmid library has identified an additional marker, D9, which is in linkage disequilibrium with CF and has been claimed to be situated .sup..about. 160 kb from KM19 towards J3.11 (Estivill et al, 1989, Am. J. Hum. Genet., 44, 704-710). No details of the sequence of D9 have been published and it is furthermore believed that the teaching and experimental detail contained in the above relevant references does not enable the skilled man to derive any further information concerning the D9 locus.
Rommens et al (1988, Am. J. Hum. Genet., 44, 645-663) have isolated a large number of RFLP markers from a chromosome 7 specific library. A total of 258 chromosome 7 specific single copy segments were identified of which 53 were localised to the 7q31-32 region. Two of these markers, D7S122 and D7S340, are in close linkage disequilibrium with CF and map between Met and J3.11. Subsequent analysis showed that D7S340 is located very close to the HTF island detected by CS7. No further details of D7S122 and D7S340 have been disclosed and they are not available to the general public.
Iannuzzi et al have described the use of a 100 kb general jumping library to isolate additional markers (Iannuzzi et al, 1989, Am. J. Hum. Genet, 44, 695-703). A jump of .sup..about. 100 kb from J3.11 towards met has been described. The clone (W32) detects a Sac II polymorphism but is not in linkage disequilibrium with CF. Again this probe is not publicly available and no further useful characterisation has been published. Additional walks from W32 and D7S340 have since been described (Collins, April 1989, Cold Spring Harbor Meeting on Genome Mapping and Sequencing, Abstract 1349). Four jumps (J16, J17, J44, J18) cover a region of .sup..about. 280 kb from D7S340 and four jumps (J32,J35,J46,J30) cover a distance of .sup..about. 400 kb from J3.11. Yet again none of this series of markers have been made publicly available.
Conventional gel electrophoresis cannot resolve DNA fragments greater than 50 kb. Recent developments in Pulsed Field Gel Electrophoresis (Anand, 1986, Trends in Genetics, 2, 278-283; Southern et al, 1987, Nucleic Acids Research, 15, 5925-5943; Carle and Olson, 1984, Nucleic Acids Research, 12, 5647-5664) have permitted the analysis and resolution of DNA fragments of &gt;1 megabase. Combined with the availability of infrequently cutting restriction enzymes such as Not 1 and BssH II, this provides a potential method of relating the genetic map to physical distance. Several groups have prepared maps of the CF locus (Poustka et al, 1988, Genomics, 2, 337-345; Drumm et al, 1988, Genomics, 2, 346-354; Fulton et al, 1989, Nucleic Acids Research, 17, 271-284). There was vague agreement between the three groups but there are inherent difficulties in constructing a map or locating a gene by this method. The methylation state of various cell lines or blood cells will result in different restriction patterns. The mobility of DNA fragments is dependent on sample loading and electrophoresis conditions rendering comparisons between experiments difficult. Thus, the CF gene has been localised to the region between the markers CS.7 and J3.11. Estimates of the distance between the two markers vary from 700-1350 kb (Poustka et al, 1988, Genomics, 2, 337-345) reflecting the inherent inconsistencies of the method.
It will be appreciated that long range mapping by PFGE is unlikely to give results which are reproducible even by the man skilled in the art when starting from published experiments. Thus although chromosome jumps to J16, J17, J44, J18, J32, J35, J46 and J30 have been documented as described above, it is not believed to be possible to localise the resultant markers with any precision. Given the inherent variability of jumping libraries and the inconsistencies of PFGE it would not be posssible for the skilled man to reproduce the experiments of, for example Iannuzzi et al with a view to independently isolating the series of markers described.
A limitation of the PFGE technique has been that the information obtained by PFGE could not be verified since large DNA fragments could not be cloned directly. Although techniques were available for cloning large tracts of DNA as many overlapping segments,the process was time consuming and prone to error. The recent development of Yeast Artificial Chromosomes (YACs) has provided a means of cloning large (100-1000 kb) fragments of DNA in a stable form (Burke et al, 1987, Science, 236, 806-812; Anand et al, 1989, Nucleic Acids Research, 17, 3425-3433; Brownstein et al, Science, 1989, 244, 1348-1351). However, there remain several technical difficulties in the making and screening of YAC libraries which have prevented the general application of the technique (Ianuzzi et al, 1989, Am. J. Hum. Genet., 44, 695-703).