The detection of mutations in genomic DNA plays a critical role in efforts to elucidate the genetic basis of human disease. For many types of genetic screening and analysis, knowledge of the presence of a mutated copy of a gene is essential. Such information may be used in prenatal and other genetic testing, as well as analysis of tumor cells and other somatic mutations. For many genes, there are a number of different mutations that can affect function.
Common diseases such as diabetes, heart disease and psychiatric disorders are caused in part by genetic variations in multiple genes. Genetic variations are not only involved in the genesis of diseases but they are also chief determinants of disease progression and response to treatment. Identification of the genetic variations involved in common diseases can greatly improve the diagnosis, prognosis, and treatment of such diseases.
One approach for identifying the potentially causative variations involved in common diseases is to screen patients and controls for genetic variations in a large number of candidate genes. Genetic coding sequences constitute less than 5% of the entire human genome, yet the vast majority of human diseases are caused by sequence variation in these coding sequences. Reagents for large scale screening of genes are already available, as a significant proportion of human gene sequences exists in the rapidly expanding public databases. Many DNA variation screening methods have been developed, e.g. single stranded conformational polymorphism (SSCP); and high performance liquid chromatography (HPLC). Since these methods are not designed to screen many genes simultaneously, their usefulness has been limited to testing a handful of candidate genes.
In the absence of high throughput technology capable of large scale screening of genes for the identification of variations involved in diseases, less straight forward approaches such as association and linkage mapping have been proposed. In these approaches, neutral genetic variations (polymorphic markers) are cataloged into a genetic map. These polymorphic markers are used in a genetic linkage or association analysis to approximate the chromosomal location of the disease genes.
Association studies are based on the probability that certain polymorphisms in close proximity to the ancestral disease-causing variation are still present in today""s patient population. In linkage or association mapping one hopes that at least a single marker is sufficiently close to the disease-causing variation, and therefore would co-segregate with the disease in a family or in a population. The analysis assumes that a large proportion of the mutations had a single point of origin.
Linkage and association based approaches have been successful for mapping of simple Mendelian diseases. However, mapping of diseases with a complex mode of inheritance has been less successful. Identification of the variations that are involved in such diseases is widely believed to require the performance of association analysis using tens of thousands of markers. Because single nucleotide polymorphisms (SNPs) are the most prevalent polymorphisms, they are proposed to be the markers of choice for these association studies.
Multiple methods, such as chip hybridization and oligonucleotide ligation assay (OLA), have been developed for genotyping of SNPs. All these SNP genotyping methods operate on a common principle of genotyping a previously identified single base polymorphism. Polymorphic sites are first identified by sequencing multiple individuals, then compiled into a map. Finally, patients and controls are tested for the presence or absence of each polymorphism.
In view of the importance of genetic testing, methods whereby one can easily screen for genetic mismatches between two DNA molecules is of great interest. A simple method to determine whether two DNA molecules are identical or different, and that is capable of multiplex analysis would be of great benefit in these analyses.
Techniques for detection of conformational changes created by DNA sequence variation as alterations in electrophoretic mobility are described in Orita et al. (1989) P.N.A.S. 86:2766; Orita et al. (1989) Genomics 5:874; Myers et al. (1985) N.A.R. 13:3131 (1985); Sheffield et al. P.N.A.S. 86:231; Myers et al. Meth. Enzym 155:501; Perry and Carrell (1992) Clin. Pathol. 45:158; White et al. (1992) Genomics 5:301. Techniques that use chemicals or proteins to detect sites of sequence mismatch in heteroduplex DNA are described in Cotton et al. (1988) P.N.A.S. 85:4397; Myers et al. (1985) Science 230:1242; Marshal et al. (1995) Nature Genetics 9:177 (1995); Youil et al. (1995) P.N.A.S. 92:87. Chip hybridization is described in Wang et al. Science 280: 1077-82.
Grompe (1993) Nature Genetics 5:111 reviews methods for screening large stretches of DNA. Mapping strategies may be found in Risch (1990) Am. J. Hum. Genet. 46:229-241; Lander and Botstein (1987) Science 236:1567-1570; and Bishop and Williamson (1990) Am. J. Hum. Genet. 46:254-265. Sandra and Ford, (1986) Nucleic Acids Res. 14:7265-7282 and Casna, et al. (1986) Nucleic Acids Res. 14:7285-7303 describe genomic analysis.
However, several approaches are presently available to isolate large DNA fragments, including long range PCR with enzymes with high fidelity described in Nielson et al. (1995) Strategies 8:26; recA-assisted cleavage described by Ferrin and Camerini-Otero (1991) Science 254:1494; and the use of a single set of oligonucleotide primers to PCR amplify multiple specific fragments simultaneously in Brookes et al. (1995) Human Molecular Genetics 3:2011.
The E. coli methyl mismatch repair system is described in Wagner and Messelson (1976) P.N.A.S. 73:4135; Modrich (1991) Annu. Rev. Genet. 25:229; Parker and Marinus (1992) P.N.A.S. 89:1730; and Carraway and Marinus (1993) J. Bacteriology 175:3972. The normal function of the E. coli methyl-directed mismatch repair system is to correct errors in newly synthesized DNA resulting from imperfect DNA replication. The system distinguishes unreplicated from newly replicated DNA by taking advantage of the fact that methylation of adenine in the sequence GATC occurs in unreplicated DNA but not in newly synthesized DNA. Mismatch repair is initiated by the action of three proteins, MutS, MutL and MutH, which lead to nicking of the unmethylated, newly replicated strand at a hemimethylated GATC site. The unmethylated DNA strand is then digested and resynthesized using the methylated strand as a template. The methyl-directed mismatch repair system can repair single base mismatches and mismatches or loops of up to four nucleotides in length. Loops of five nucleotides and larger are not repaired.
The use of site specific recombinases in eukaryotic cells is described by Wahl et al., U.S. Pat. No. 5,654,182; and by Sauer, U.S. Pat. No. 4,959,317.
Compositions and methods are provided for an in vivo bacterial assay, termed xe2x80x9cMismatch Repair Detectionxe2x80x9d (MRD). The method detects mismatches in a double stranded DNA molecule, where the sequence of one strand differs from the sequence of the other strand by as little as a single nucleotide. The two strands of the DNA molecule are from different sources. One strand is unmethylated DNA, having a detectable marker gene and the sequence being tested for mismatches. The other strand is methylated DNA, having an inactivated copy of the marker gene where the defect does not activate repair mechanisms, and another copy of the sequence to be tested. Heteroduplex dsDNA formed from the hybridization of the two strands is transformed into a bacterial host with an active methyl mismatch repair system (MMR host).
The host repair system is activated by a mismatch in the sequence of interest, and will then xe2x80x9cco-repairxe2x80x9d the marker gene, to produce an inactive, double stranded copy. When the two strands of the sequence of interest are a perfect match, the marker gene is not altered, and the transformed bacteria will produce active marker. Where a mismatch is present, the transformants are readily identified by the lack of active marker, and may then be isolated and grown for further analysis. MRD is a rapid method for analysis of numerous fragments simultaneously. It is useful as an assay for enumerating differences between various sources of DNA, and as a means of isolating DNA with variant sequences.