The present invention relates to the determination of di- and trinucleotide repeat mutations involved with increasing number of genetically inherited diseases characterized by the expansion or the amplification of a core di- or trinucleotide sequences.
More particularly, the present invention concerns a sensitive method, an automated instrument and kits for unequivocally quantifying the exact number of di- and trinucleotide repeats in preselected genetic loci.
The present method, instrument and kits are also useful for determining the number of head to tail repeat sequences of two or more nucleotide bases, provided that the core sequence consists of no more than three types of nucleotide bases such as, for example, adenine (A), guanine (G) and cytosine (C) in a core sequence composed of (AAGCGCA).sub.n (n.gtoreq.1).
In recent years, increasing number of genetically inherited diseases were found to be associated with mutations designated unstable trinucleotide repeat mutations in which a core sequence of three nucleotide bases is expanded or amplified, such that affected individuals and in some cases carriers, contain more repeats than apparently healthy ones, in the particular DNA locus implicated with the disease.
More recently, a new cancer gene was discovered and was shown to cause segments of DNA to be abnormally repeated in pairs and/or triplets in tumor cells of individuals carrying the cancer gene. In this case hundred of thousands of short units of DNA are copied over and over again, presumably destroying the tumor cells ability to control their growth. This phenomenon is somewhat different than the expansion of unstable trinucleotide repeats in hereditary diseases since it occurs in many DNA loci within the same cell. Nevertheless, there is a good reason to believe that both phenomena are propagated by similar mechanisms. See, Kolata G. Health/Science section, The National Herald Tribune, Thursday, May 13, 1993.
So far, seven genetically transmitted diseases, each involving a unique genetic locus, have been implicated with trinucleotide repeat mutations. These include: Fragile XA (A site, Martin Bell) syndrome (FRAXA); spinal and bulbar muscular atrophy, SBMA (Kennedy disease); Myotonic dystrophy (Curschmann Steinert, DM); Huntington's disease (HD); Spinocerebellar ataxia type 1 (SCA1); Fragile XE (E site) mental retardation (FRAXE MR); and Dentatorubral pallidoluysian atrophy (DRPLA). Since di- and trinucleotide repeats have been observed within or close to a number of additional human genes by gene-bank searches, it is conceivable that di- and trinucleotide amplifications may be involved in the causation of other genetic diseases as well.
Fragile XA syndrome is an X chromosome linked recessive disorder with incomplete penetrance. It is characterized by moderate to severe mental retardation and other phenotype characteristics, and is one of the most common forms of mental retardation with an estimated incidence of 1 in 1250 males and corresponding 1 in 2500 females (heterozygotes), rendering this disease one of the most common human diseases and the most common form of familial retardation. Fragile X chromosomes present their unique phenotype when leukocyte cells carrying them are grown in culture under folate starvation. As mentioned, the fragile X syndrome is characterized by incomplete penetrance hence (1) some males, referred to as normal transmitting males (NTMs), are clinically normal but are inferred to carry the genetic defect by a position in pedigrees rendering them obligatory carriers; (2) one third of female carriers have evidence of mild mental impairment. Genetic linkage studies effected by restriction fragment length polymorphism analysis of informative pedigrees; and somatic cell hybrid studies of hamster chromosomes carrying translocated segments of human fragile X chromosomes in cells grown in culture under folate starvation, enabled the localization of the fragile X gene to chromosomal band q27.3 on the X chromosome (Xq27.3). Eventually the fragile X defective gene, designated fragile X mental retardation 1 (FMR1), was isolated via positional cloning and led to the discovery of a highly polymorphic (CGG).sub.n sequence within its 5' untranslated region. Population and fragile X patients screening revealed that healthy individuals are characterized by low numbers of the (CGG).sub.n trinucleotide repeat (n=6-52); carriers are characterized by medium numbers of the (CGG).sub.n trinucleotide repeat (n=50-200); and affected individuals are characterized by high numbers of the (CGG).sub.n trinucleotide repeat (n=230-1000). When the (CGG).sub.n trinucleotide repeats of the FMR1 gene exceeds approximately 230 repeats, the DNA of the entire 5' region of the gene becomes abnormally methylated. This methylation extends upstream into and beyond the promoter region and results in the transcriptional suppression of the FMR1 gene leading to the cessation of the FMR1 protein production which is probably the cause of the phenotype. See Annemieke J. M. H. (1991) Cell, 65:905-914; Pieretti M. (1991) Cell, 66:817-822; Caskey T. C. et al. (1992) Science, 256:784-788.
Spinal and bulbar muscular atrophy (SBMA), like the fragile X syndrome, is a rare X linked recessive genetic disorder characterized by adulthood onset of progressive muscular weakness of upper and lower extremities which is secondary to neural degeneration. Affected males have reduced fertility and excessive development of the male mammary glands (gynecomastia); female carriers have few or no symptoms. Genetic linkage analysis of informative pedigrees enabled the localization of SBMA to chromosome Xq11-12, the region where the gene encoding the androgen receptor (AR) was previously localized, rendering this gene a candidate for SBMA. Studies of the AR gene revealed a highly polymorphic (CAG).sub.n trinucleotide repeat, situated in exon 1, encoding a variable polyglutamine stretch in the AR protein. Further studies of the AR gene from normal and SBMA affected individuals revealed that while low numbers of the (CAG).sub.n trinucleotide repeat (n=12-34) characterize apparently healthy individuals, high numbers of the (CAG).sub.n trinucleotide repeat (n=40-62) characterize SBMA affected individuals, while a carrier state is not yet known for this mutation. The influence of the expanded polyglutamine tract on the AR protein is not yet established, nevertheless, gain of function, leading to the SMBA phenotype is suspected. See, Albert R. et al. (1991) Nature, 352:77-79; Caskey T. C. et al. (1992) Science, 256:784-788.
Myotonic dystrophy (DM) is an autosomal dominant disease characterized by myotonia, cardiac arrhythmias, cataracts, male balding, male infertility (hypogonadism), and other associated endocrinopathies. The rare congenital form of DM is associated with profound newborn hypotonia and mental retardation. DM has a prevalence of 2.5-5.5 affected per 100,000 individuals. DM was mapped by genetic linkage to chromosome 19q13.3 and the DM gene, designated myotonin protein kinase (MT-PK), was isolated via positional cloning and other molecular methods. Further studies revealed a polymorphic (GCT).sub.n trinucleotide repeat situated in the 3' untranslated region of the MT-PK gene. Analyses of the MT-PK gene from normal and DM affected individuals revealed that while low numbers of the (GCT).sub.n trinucleotide repeat (n=5-37) characterize apparently healthy individuals, high numbers of the (GCT).sub.n trinucleotide repeat (n=100-&gt;1000) characterize DM affected individuals, while the carrier state is characterized by medium numbers of the (GCT).sub.n trinucleotide repeat (n.congruent.50-100). Further studies have revealed that expansion of the (GCT).sub.n trinucleotide repeat leads to increased MT-PKmRNA stability, therefore to the production of more MT-PK protein suspectedly leading, directly or indirectly, to the DM phenotype. See, Fu Y. H. et al. (1992) Science, 255:1256-1258; Caskey T. C. et al. (1992) Science, 256:784-788.
Huntington's disease (HD) is a devastating late onset autosomal dominant neurodegenerative disorder characterized by progressive neurodegeneration with personality disturbance, involuntary movements, cognitive loss and an inexorable progression to death 15-20 years from time of onset. HD occurs with a frequency of 1 in 10,000 individuals in most populations of Caucasian descent. The HD gene was localized to chromosome 4p16.3 by genetic linkage analysis with polymorphic DNA markers. Recently, following 10 years of extensive research, the defective gene causing HD, designated IT15, was isolated and a polymorphic (CAG).sub.n trinucleotide repeat encoding a polyglutamine stretch, situated in exon 1 of the gene was discovered. It was further found that the (CAG).sub.n trinucleotide repeat is expanded in HD chromosomes (n=42-100) as compared with normal chromosomes (n=11-36), presumably leading to IT15 proteins gain of function, suspectedly leading to the HD phenotype. See, The Huntington's disease collaborative research group (1993) Cell, 72:971-983; Zuhlke C. et al. (1993) Hum. Molec. Genet. 2:1467-1469.
Spinocerebellar ataxia type 1 (SCA1) is a progressive late onset autosomal dominant disorder characterized by ataxia, ophthalmoparesis and variable degree of motor weakness due to neurodegeneration of the cerebellum, spinal chord and brain stem, leading to complete disability and death 10-20 years after onset. The SCA1 gene was localized to chromosome 6p22-p23 due to strong genetic linkage with the highly polymorphic HLA locus and other polymorphic DNA markers. The defective gene causing SCA1, was isolated in a yeast artificial chromosome contig and subcloned into cosmids. A polymorphic (CAG).sub.n trinucleotide repeat encoding a polyglutamine stretch, situated in exon 1 of the SCA1 gene was discovered. It was further found that the (CAG).sub.n trinucleotide repeat is expanded in SCA1 chromosomes (n=43-81) as compared with normal chromosomes (n=19-36), presumably leading to SCA1 proteins gain of function, suspectedly leading to the SCA1 phenotype. See, Orr H. T. et al. (1993) Nature Genetics, 4:221-226.
Fragile XE mental retardation (FRAXE MR), like FRAXA is an X chromosome linked recessive disorder with incomplete penetrance. It is characterized by moderate to severe mental retardation and other phenotype characteristics. Like FRAXA, FRAXE chromosomes present their unique phenotype when leukocyte cells carrying them are grown in culture under folate starvation. Genetic linkage studies enabled the localization of the FRAXE gene to chromosome Xq28. Eventually the FRAXE gene was isolated via positional cloning and led to the discovery of a highly polymorphic (GCC).sub.n trinucleotide repeat segregating with the disease. Population and FRAXE patients screening revealed that healthy individuals are characterized by low numbers of the (GCC).sub.n trinucleotide repeat (n=6-25); carriers are characterized by medium numbers of the (GCC).sub.n trinucleotide repeat (n=116-133); and affected individuals are characterized by high numbers of the (CGG).sub.n trinucleotide repeat (n=200-850). When the (CGG).sub.n trinucleotide repeat of the FRAXE gene exceeds approximately 200 repeats, the DNA of a CpG island located in the trinucleotide repeats vicinity becomes abnormally methylated, presumably leading to the secession of the FRAXE protein production, which is probably the cause of the phenotype. See Knight S. J. L. et al. (1993) Cell, 74:127-134.
Dentatorubral pallidoluysian atrophy (DRPLA) is a late onset autosomal dominant neurodegenerative disorder, prevalent in Japan, characterized by a varying combinations of progressive myoclonus, epilepsy, ataxia, choreoathetosis and dementia. Neuropathological changes consist of combined degeneration of the dentatorubal and pallidoluysian systems of the central nervous system. The disease is further characterized by variable penertance, even in a single family. Linkage analysis in DRPLA families enabled to localize the DRPLA gene to chromosome 12p12-13. The DRPLA gene was isolated via screening for (CAG).sub.n unstable trinucleotide repeat that was found to be located in exon 1 of the gene, encoding a variable polyglutamine stretch in the DRPLA protein. It was further found that the (CAG).sub.n trinucleotide repeat is expanded in DRPLA chromosomes (n=49-75) as compared with normal chromosomes (n=7-23), presumably leading to DRPLA proteins gain of function, suspectedly leading to the DRPLA phenotype. See, Nagafuchi S. et al. (1994) Nature Genetics, 6:14-18; Koide R. et al. (1994) Nature Genetics, 6:9-13.
Because of the high frequency, variable penetrance and instability of Fragile XA syndrome and other genetically inherited disorders associated with trinucleotide repeats expansion, there is a widely recognized need for, and it would be highly advantageous to have, a low cost method, demanding merely non skilled personnel for its execution, that enables the efficient and accurate determination of the number of repeats in various genes.
Unlike the common gene mutations (e.g., Cystic Fibrosis .DELTA.F508), which are stable, that is, they are transmitted unchanged along the generations of pedigrees, the situation is somewhat different for the trinucleotide repeat mutations which are characterized by instability, that is, when the number of repeats exceeds a threshold value, these mutations have a tendency to expand and include a greater number of repeats (1) when vertically transmitted from parents to children along genetic traits; and (2) when somatically transmitted to daughter cells in a given individual, a phenomena designated somatic instability, yielding mosaicism.
The two types of instability characterizing trinucleotide repeat mutations will be exemplified herein for the fragile XA syndrome.
Fragile XA unstable alleles are observed in normal transmitting males (NTMs) their asymptomatic daughters and symptomatic male grandchilds. When the number of trinucleotide repeats of such alleles was determined, it was found to increase along generations, in one example from 82 in the NTM father to 83 in the asymptomatic daughter (90 in a second asymptomatic daughter) to &gt;200 in the diseased grandchildren. The 82, 83 and 90 repeats containing alleles are referred to as premutation alleles. It was a study of numerous families of this type that permitted a correlation of the phenomenon of anticipation (earlier ages of onset or severeness in successive generations) and the molecular events of the (CGG).sub.n expansion. NTMs carry numbers of CGG repeats outside the range of normal and bellow those found in affected males. Such males transmit the repeats to their progeny with relatively small changes in the repeats number. On the other hand females who carry similar premutation alleles are prone to bear progeny (male or female) with large expansion of the repeats region. Thus, large CGG amplification associated with fragile XA syndrome appears to be predominantly a female meiotic event. See, Caskey T. C. et al. (1992) Science, 256:784-788.
Many fragile XA diseased individuals were found to be mosaic in respect with the number of the CGG trinucleotide repeats characterizing different cells in their body, a phenomenon indicating somatic instability of expanded repeats.
Instability, characterized by expansion of trinucleotide repeats, is observed also in DM, HD, FRAXE, DRPLA and SCA1 pedigrees. As opposed to FRAXA, DM and FRAXE high risk alleles can expand to similar extent via both male and female meiosis and to the best of our knowledge somatic mosaicism was not yet observed in DM and FRAXE patients. High risk alleles were not yet found for HD and DRPLA, that is, alleles of these diseases are either carrying or not carrying the disease. Nevertheless, HD repeats are also unstable in more than 80% of meiotic transmissions but, on the other hand, they are characterized by increasing, or alternatively, decreasing numbers of repeats with the largest increase occurring in paternal transmission (Duyao M. et al. (1993) Nature Genetics, 4:387-392), whereas DRPLA alleles have a tendency to increase in size along generations. See, Nagafuchi S. et al. (1994) Nature Genetics, 6:14-18; Koide R. et al. (1994) Nature Genetics, 6:9-13.
Attempts to correlate the size of trinucleotide repeat mutations and the severity of the associated genetic diseases were made for fragile XA syndrome, myotonic dystrophy, dentatorubral pallidoluysian atrophy and spinocerebellar ataxia type 1.
For fragile XA, as expected, median IQ score was significantly lower for females carrying a fully expanded mutation (above 230 repeats) than for females carrying a premutation (50-200 repeats) on one of their X chromosomes. On the other hand, no significant relationship was found between IQ score and number of CGG repeats, see, Taylor A. K. et al. (1994) JAMA, 271:507-514. Nevertheless, it was found that prenatal DNA studies of the number of CTG trinucleotide repeats characterizing myotonic dystrophy alleles can improve the estimation of clinical severity; and that the number of CAG trinucleotide repeats in spinocerebellar ataxia type 1 and dentatorubral pallidoluysian atrophy is correlated with increased progression of the disease (Nagafuchi S. et al. (1994) Nature Genetics, 6:14-18; Koide R. et al. (1994) Nature Genetics, 6:9-13; Orr H. T. et al. (1993) Nature Genetics, 4:221-226).
Attempts to correlate between the size of trinucleotide repeat mutations and the age of onset of Huntington's disease resulted in finding a reversed correlation confined to the upper range of trinucleotide repeat numbers (ca. 60-100 repeats), see, Andrew S. E. et al. (1993) Nature Genetics, 4:398-403. Furthermore, for spinocerebellar ataxia type 1 and dentatorubral pallidoluysian atrophy (Nagafuchi S. et al. (1994) Nature Genetics, 6:14-18; Koide R. et al. (1994) Nature Genetics, 6:9-13), a direct correlation between the number of the (CAG).sub.n trinucleotide repeats expansion and earlier ages of onset was found.
Collectively, these data call for the development of a reliable, accurate and easy to operate di- and trinucleotide repeats quantification method aimed at post and prenatal diagnosis and prognosis.
Three basic methods are currently used to determine the number of di- and trinucleotide repeats in any particular locus, these are: (1) "Southern" blot analysis; (2) in vitro amplification via the Polymerase Chain Reaction (PCR) and PCR fragment size determination; (3) DNA sequencing (usually of PCR amplified fragments).
"Southern" blot analysis for the quantification of di- and trinucleotide repeats is a method based upon: (1) enzymatic cleavage of genomic DNA obtained from the examined individual via sequence specific restriction enzymes cleaving the DNA at many sites including the flanking regions both 5' and 3' to the DNA region containing the examined di- or trinucleotide repeats; (2) gel electrophoresis aimed at size separation of the DNA fragments obtained under step (1); (3) blotting or transferring the cleaved and size separated DNA fragments to a test surface; (4) preparing a labeled probe capable of specific hybridization with the blotted DNA fragment containing the repeats; (5) hybridizing the labeled probe with the blotted DNA fragments; (6) washing off probe excess to obtain specific hybridization and to reduce non-specific and background signals; (7) detecting positive signals via means dependent upon the probe labeling technique employed under step (4); (8) interpreting the results by determining the size of the fragment hybridized to the labeled probe; and finally (9) calculating the number of di- or trinucleotide repeats.
"Southern" blot analysis for the quantification of di- and trinucleotide repeats has major drawbacks: (1) the method is primarily dependent upon the existence of suitable restriction enzymes recognition sites in the immediate 5' and 3' flanking region of the repeats region; (2) gel electrophoresis employed under "Southern" blot analysis has low resolution capacity for small size variations, therefore this method is not suitable for monitoring small variations in the number of the di- or trinucleotide repeats; (3) "Southern" blot analysis is not capable of distinguishing between size variations due to di- or trinucleotide repeats expansion/de-expansion or other molecular events such as the loss or the formation of a restriction enzyme recognition site/s due to point mutation, deletions or insertions, yielding a di- or trinucleotide repeats expansion/de-expansion independent length polymorphism; (4) "Southern" blot analysis demands accurate execution of a multistep procedure, of which most steps include several complicated steps, are time consuming and require highly skilled personnel for their routine execution, especially gel electrophoresis, blotting and hybridization; (5) highly skilled personnel are also needed for interpreting the results and for calculating the di- or trinucleotide repeats number; (6) gel electrophoresis, hybridization and washing conditions may vary considerably depending upon fragment size and sequence, therefore, "Southern" blot analysis requires different calibration of the procedure for any given disease; and last but not least (7) due to its being a multistep procedure, "Southern" blot analysis is not easily applicable for complete automatization.
In vitro amplification via the Polymerase Chain Reaction (PCR) and PCR fragment size determination is easier for execution as compared with "Southern" blot analysis and involves less steps, these are: (1) PCR amplification of the di- or trinucleotide repeats region using PCR primers from the 5' and 3' flanking regions of the repeats; (2) size determination of thus obtained PCR fragments via high resolution gel electrophoresis; and (3) calculating the number of the di- or trinucleotide repeats. See, Erster S. H. (1992) Hum. Genet., 90:55-61.
Although this approach is simpler and therefore easier for routine execution it shares some of the drawbacks described for "Southern" blot analysis, these include: (1) the PCR approach is not capable of distinguishing between size variations due to di- or trinucleotide repeats expansion/de-expansion or other molecular events such as the loss or gain of sequences due to deletions or insertions in the 5' and/or 3' repeats flanking regions, yielding a di- or trinucleotide repeats expansion/de-expansion independent PCR fragment length polymorphism; (2) a high resolution gel electrophoresis is required for resolving small size variations in the PCR fragments, this calls for highly skilled personnel and therefore not suitable as a routine diagnostics procedure; (3) highly skilled personnel are also needed for interpreting the results and for calculating the number of the di- or trinucleotide repeats. In addition: (4) the PCR approach is not suitable for quantifying highly expanded di- or trinucleotide repeats, since its amplifying capacity is limited to relatively small fragments, therefore, in cases where the fragment to be amplified exceeds a certain size limit the PCR reaction will fail to yield a specific product; and (5) some of the di- or trinucleotide repeats form highly GC rich stretches of DNA which are not easily amplified via standard PCR protocols.
The most basic method for determination of di- or trinucleotide repeats number is DNA sequencing. The most widely used sequencing method is based on the dideoxynucleotide chain termination procedure. The technique involves the incorporation of dideoxynucleotides with the aid of a DNA extension enzyme at the 3'-end of an elongating DNA chain. Once the dideoxynucleotide has been incorporated, further elongation of the chain is blocked. See, Sanger F. (1981), Science 214, 1205 1210. Recently automated DNA sequencing techniques have been developed which provide for more rapid and safer DNA sequencing. One such approach utilizes a set of four chain terminating fluorescently labeled dideoxynucleotides. See Chehab, F. F., et al. (1989), Proc. Natl. Acad. Sci. (USA) 86, 9178 9182; Prober, J. M., et al. (1987), Science 238, 336 341; Smith, L. M., et al. (1980), Nature 321, 674 678). In this method succinyl fluorescein dyes are used. Each dideoxynucleotide receives a different dye of different absorption and emission characteristics. Thus, DNA molecules labeled with each of the different dideoxynucleotides may be distinguished from one another. Using these dideoxynucleotides, it is possible to sequence a DNA segment by carrying out a single reaction in which all four of the differently labeled dideoxynucleotides are added together into a single reaction mixture and the resulting labeled oligonucleotide fragments may then be resolved by polyacrylamide gel electrophoresis in a single sequencing lane on the gel. The gel is then scanned by a fluorimeter capable of distinguishing the different fluorescent labels. The sequence of the different labels along the lane is then translated into the sequence of the tested DNA segment.
DNA sequencing as a method for quantification of di- or trinucleotide repeat numbers has few major drawbacks, these are: (1) a high voltage and high resolution gel electrophoresis is required for resolving the single stranded DNA nested fragments obtained during the sequencing reaction, differing from each other merely by one nucleotide base, this calls for highly skilled personnel and therefore not suitable as a routine diagnostics procedure; (2) some of the di- or trinucleotide repeats form highly GC rich stretches of DNA which are not easily sequenced via standard sequencing protocols; and (3) the sequencing approach is not suitable for quantifying large di- or trinucleotide repeats since it is limited by the resolution power of the sequencing gel.
It is an object of the present invention to provide a simple, reliable, rapid, highly accurate and easy to operate di- and trinucleotide repeats quantification method aimed at post and prenatal diagnosis and prognosis which do not require electrophoresis or similar separation according to size as part of its methodology.
It is another object of the present invention to provide a diagnostic kit and an automated instrument to be used for carrying out the above method of the invention.