The detection of mutations in genomic DNA plays a critical role in efforts to elucidate the genetic basis of human disease. For many types of genetic screening and analysis, knowledge of the presence of a mutated copy of a gene is essential. Such information may be used in prenatal and other genetic testing, as well as analysis of tumor cells and other somatic mutations. For many genes, there are a number of different mutations that can affect function.
Common diseases such as diabetes, heart disease and psychiatric disorders are caused in part by genetic variations in multiple genes. Genetic variations are not only involved in the genesis of diseases but they are also chief determinants of disease progression and response to treatment. Identification of the genetic variations involved in common diseases can greatly improve the diagnosis, prognosis, and treatment of such diseases.
One approach for identifying the potentially causative variations involved in common diseases is to screen patients and controls for genetic variations in a large number of candidate genes. Genetic coding sequences constitute less than 5% of the entire human genome, yet the vast majority of human diseases are caused by sequence variation in these coding sequences. Reagents for large scale screening of genes are already available, as a significant proportion of human gene sequences exists in the rapidly expanding public databases. Many DNA variation screening methods have been developed, e.g. single stranded conformational polymorphism (SSCP); and high performance liquid chromatography (HPLC). Since these methods are not designed to screen many genes simultaneously, their usefulness has been limited to testing a handful of candidate genes.
In the absence of high throughput technology capable of large scale screening of genes for the identification of variations involved in diseases, less straight forward approaches such as association and linkage mapping have been proposed. In these approaches, neutral genetic variations (polymorphic markers) are cataloged into a genetic map. These polymorphic markers are used in a genetic linkage or association analysis to approximate the chromosomal location of the disease genes.
Association studies are based on the probability that certain polymorphisms in close proximity to the ancestral disease-causing variation are still present in today's patient population. In linkage or association mapping one hopes that at least a single marker is sufficiently close to the disease-causing variation, and therefore would co-segregate with the disease in a family or in a population. The analysis assumes that a large proportion of the mutations had a single point of origin.
Linkage and association based approaches have been successful for mapping of simple Mendelian diseases. However, mapping of diseases with a complex mode of inheritance has been less successful. Identification of the variations that are involved in such diseases is widely believed to require the performance of association analysis using tens of thousands of markers. Because single nucleotide polymorphisms (SNPs) are the most prevalent polymorphisms, they are proposed to be the markers of choice for these association studies.
Multiple methods, such as chip hybridization and oligonucleotide ligation assay (OLA), have been developed for genotyping of SNPs. All these SNP genotyping methods operate on a common principle of genotyping a previously identified single base polymorphism. Polymorphic sites are first identified by sequencing multiple individuals, then compiled into a map. Finally, patients and controls are tested for the presence or absence of each polymorphism.
In view of the importance of genetic testing, methods whereby one can easily screen for genetic mismatches between two DNA molecules is of great interest. A simple method to determine whether two DNA molecules are identical or different, and that is capable of multiplex analysis would be of great benefit in these analyses.
The identification of single nucleotide polymorphisms (SNPs) covering the entire genome will lead to numerous association studies of complex traits. Most scenarios for such studies assume a universal set of relatively frequent SNPs, distributed in all or most ethnic populations. One widely considered approach is to identify susceptibility alleles through direct association studies using SNPs located in coding or regulatory sequences. The main alternative strategy is to search for linkage disequilibrium (LD) between disease susceptibility alleles and SNPs from a dense genome-wide map. Either of the above approaches requires efficient genotyping to score for the presence or absence of previously identified SNPs. Both approaches, however, may be unrealistic when variant alleles, either those directly responsible for disease susceptibility or SNPs, are infrequent or are specific to a particular population. In such cases, identifying susceptibility alleles may require comprehensive sequence comparison between patients and control. Accomplishing such sequence comparison requires a high throughput DNA variation scanning technology to identify all possible variations in the tested fragments. The Variant Detection Array (VDA) method is perhaps the only existing approach for DNA variant scanning with a high potential for parallel processing. However, VDA is expensive and may be sub-optimally specific and sensitive.
Relevant Literature
Techniques for detection of conformational changes created by DNA sequence variation as alterations in electrophoretic mobility are described in Orita et al. (1989) P.N.A.S. 86:2766; Orita et al. (1989) Genomics 5:874; Myers et al. (1985) N.A.R. 13:3131 (1985); Sheffield et al. P.N.A.S. 86:231; Myers et al. Meth. Enzym 155:501; Perry and Carrell (1992) Clin. Pathol. 45:158; White et al. (1992) Genomics 5:301.
Techniques that use chemicals or proteins to detect sites of sequence mismatch in heteroduplex DNA are described in Cotton et al. (1988) P.N.A.S. 85:4397; Myers et al. (1985) Science 230:1242; Marshal et al. (1995) Nature Genetics 9:177 (1995); Youil et al. (1995) P.N.A.S. 92:87. Chip hybridization is described in Wang et al. Science 280: 1077–82.
Grompe (1993) Nature Genetics 5:111 reviews methods for screening large stretches of DNA. Mapping strategies may be found in Risch (1990) Am. J. Hum. Genet. 46:229–241; Lander and Botstein (1987) Science 236:1567–1570; and Bishop and Williamson (1990) Am. J. Hum. Genet. 46:254–265. Sandra and Ford, (1986) Nucleic Acids Res. 14:7265–7282 and Casna, et al. (1986) Nucleic Acids Res. 14:7285–7303 describe genomic analysis.
However, several approaches are presently available to isolate large DNA fragments, including long range PCR with enzymes with high fidelity described in Nielson et al. (1995) Strategies 8:26; recA-assisted cleavage described by Ferrin and Camerini-Otero (1991) Science 254:1494; and the use of a single set of oligonucleotide primers to PCR amplify multiple specific fragments simultaneously in Brookes et al. (1995) Human Molecular Genetics 3:2011.
The E. coli methyl mismatch repair system is described in Wagner and Messelson (1976) P.N.A.S. 73:4135; Modrich (1991) Annu. Rev. Genet. 25:229; Parker and Marinus (1992) P.N.A.S. 89:1730; and Carraway and Marinus (1993) J. Bacteriology 175:3972. The normal function of the E. coli methyl-directed mismatch repair system is to correct errors in newly synthesized DNA resulting from imperfect DNA replication. The system distinguishes unreplicated from newly replicated DNA by taking advantage of the fact that methylation of adenine in the sequence GATC occurs in unreplicated DNA but not in newly synthesized DNA. Mismatch repair is initiated by the action of three proteins, MutS, MutL and MutH, which lead to nicking of the unmethylated, newly replicated strand at a hemimethylated GATC site. The unmethylated DNA strand is then digested and resynthesized using the methylated strand as a template. The methyl-directed mismatch repair system can repair single base mismatches and mismatches or loops of up to four nucleotides in length. Loops of five nucleotides and larger are not repaired.
The use of site specific recombinases in eukaryotic cells is described by Wahl et al., U.S. Pat. No. 5,654,182; and by Sauer, U.S. Pat. No. 4,959,317.