The present invention relates generally to the fields of biochemistry and medicine. In particular, the invention is directed to materials and methods useful in the diagnosis of genetic mutations of clinical relevance.
Short tandem repeats (STR) have been identified in a number of genes. It has been proposed that particular unstable triplet repeat oligonucleotides are correlated with a number of genetic diseases in humans, including Kennedy's disease [La Spada, A. et al., Nature, 352, 77-79 (1991)], fragile-X syndrome [Verkerk, A. J. M. H. et al., Cell 65, 905-914 (1991)], myotonic dystrophy [Fu, Y. H. et al, Science 255, 1256-1258 (1992)], Huntington disease [The Huntington's Disease Collaborative Research Group, Cell 72, 971-983 (1993)] and spinocerebellar ataxia type 1 [Orr, H. T. et al., Nature Genet. 4, 221-226 (1993)]. Similarly, doublet repeats have also been reported to be associated with particular disease states; for example, correlations have been proposed with cystic fibrosis [Chu, C.-S. et al., Nature Genetics 3, 151-156 (1993)] and colorectal cancer [Thibodeau, S. N. et al., Science 260, 816-819 (1993)]. Higher-order repeats, such as tetramers [see, e.g., Gen, M. W. et al., Genomics 17, 770-772 (1993)], have also been identified.
One gene which has been subject of intense scrutiny is the Huntington's disease gene. The trinucleotide hybridization approach was recently utilized to map out tandem repeats across a section of the gene. In this section, 51 triplet repeats spanning a 1.86 Mbp DNA segment were identified by Southern transfer of restriction enzyme digests of a specific cosmid and probing with .sup.32 P-labelled oligonucleotide probes [Hummerich, et al., "Distribution of trinucleotide repeat sequences across a 2 Mbp region containing the Huntington's disease gene," Human Molecular Genetics 3, 73 (1994)].
DNA polymorphisms which arise from allelic differences in the number of repeats have been identified by such terminology as short tandem repeats (STR), variable number of tandem repeats (VNTR), minisatellites (tandem repeats of a short sequence, originally defined as 9-60 bp) and microsatellites (originally defined as 1-5 bp) [McBride, L. J. & O'Neill, M. D., American Laboratory, pp. 52-54 (November 1991)]; minisatellites and microsatellites would be considered subclasses of the VNTR. It is estimated that there are up to 500,000 microsatellite repeats distributed throughout the human genome, at an average spacing of 7000 bp. Therefore, it is apparent that most genes will contain VNTR regions and that these regions can be used as genetic markers. For example, VNTRs are currently being used as markers in studies concerned with the inheritance of certain mutations leading to various forms of cancer. Recently, it has been discovered that certain triplet repeat expansions are associated with a predisposition towards certain diseases; a large expansion is typically associated with the onset of the disease. For example, the (CGG) triplet repeat region associated with Fragile X occurs at a frequency of 10-50 repeat units in the normal population, while in those afflicted with the disease the expansion is between 200-2000 repeats.
As it becomes possible to determine whether a particular genotype comprises an unstable repeat and/or is associated with a particular disease state, there is a considerable incentive to develop useful methods to characterize STRs. The heretofore available methods for initial scanning for STRs have generally required time-consuming sequential oligonucleotide hybridizations to filter-bound target DNAs to identify specific STRs [see, e.g., Litt, M. and Luty, J. A., Am. J. Hum. Genet. 44, 397-401 (1989); Weber, J. L. and May, P. E., Am. J. Hum. Genet. 44, 388-396 (1989); Fu et al., supra]. In particular, the analysis of oligonucleotide repeats is typically carried out at the present time by Southern blotting of restriction fragments followed by hybridization analysis using a specified repetitive sequence probe. Alternatively, it is possible to probe dot blots of the target DNA [Iizuka, et al., GATA 10:2-5 (1993)].
Both of these heretofore-known techniques are time-consuming and tedious for large sample populations. Moreover, multiple probings may be required to identify which repeat might be present. Further, it is often difficult to reproducibly spot or transfer equivalent amounts of DNA to these supports; thus, conventional dot blots and transfers show variation in signal intensity from batch to batch. In addition, any regions of DNA that might become cross-linked to the support (e.g., through UV light) would be inaccessible to probes.
It would be highly useful for clinical investigators to be able to screen large sample populations of patients DNAs in an effective manner. As additional STRs are identified and associated with particular conditions, the need for simple and effective screening methods becomes greater.
PCT published application No. WO 89/10977 describes methods and apparatus for analyzing polynucleotide sequences in which an array of the whole or a chosen part of a complete set of oligonucleotides are bound to a solid support. The different oligonucleotides occupy separate cells of the array and are capable of taking part in hybridization reactions. For studying differences between polynucleotide sequences, the array may comprise the whole or a chosen part of a complete set of oligonucleotides comprising the polynucleotide sequences. While it is suggested that a small array may be useful for many applications, such as the analysis of a gene for mutations, there is no teaching or suggestion of a specific array or method for using same which would permit the rapid and accurate screening of a wide range of biological materials for tandem repeats. Moreover, the arrays described in WO 89/10977 are designed specifically for use in sequencing by hybridization; the presence of long tandem nucleotide repeats can present a significant problem in attempts to sequence a sample using the methods described in WO 89/10977.
It is an object of the present invention to provide methods and apparatus for rapid and accurate identification of nucleotide tandem repeats in DNA and RNA sequences from a wide variety of sources.