The invention relates generally to methods for identifying polymorphic DNA sequences, and more particularly, to a method of comparing a reference DNA population with a test DNA population for the purpose of identifying sequences that are different.
Genetic factors contribute to virtually every disease, conferring susceptibility or resistance, or influencing interaction with environmental factors, Collins et al, Science, 278: 1580-1581 (1997). As genome mapping and sequencing projects advance, more attention is being directed to the problem of sequence variability, both between genomes of the same species and, perhaps more importantly, between genetic regulatory elements and expressed genes of different individuals of the same species. In the area of human health, it is believed that a detailed understanding of the correlation between genotype and disease susceptibility, responsiveness to therapy, likelihood of side-effects, and other complex traits, will lead to improved therapies, to improved application of existing therapies, to better preventative measures, and to better diagnostic procedures. Caskey, Science, 236: 1223-1229 (1987); White and Caskey, Science, 240: 1483-1488 (1988); Lander et al, Science, 265: 2037-2048 (1994); Schafer et al, Nature Biotechnology, 16: 33-39 (1998); and Housman et al, Nature Biotechnology, 16: 492-493 (1998).
Many techniques are available for detecting the presence or absence of a suspected mutation or polymorphic sequence, including direct sequencing, ligation-based assays, restriction fragment length analysis, allele-specific polymerase chain reaction, assays based on differential electrophoretic mobilities, primer extension, mismatch repair enzymes, and specific hybridization, e.g. Taylor, Editor, Laboratory Methods for the Detection of Mutations and Polymorphisms in DNA (CRC Press, Boca Raton, 1997); Cotton, Mutation Detection (Oxford University Press, Oxford, 1997); Landegren et al, Science, 242: 229-237 (1988); Brown, Current Opinion in Genetics and Development. 4: 366-373 (1994): Shumaker et al, Human Mutation, 7: 346-354 (1996); Nikiforov et al, Nucleic Acids Research. 22: 4167-4175 (1994); Pastinen et al, Genome Research, 7: 606-614 (1997); Lisitsyn et al, Science, 259: 946-951 (1993); and the like. However, most of these techniques are not directed to large-scale identification, or surveying, of polymorphic sequences, either for whole genomes or for expressed genes, and several require that the polymorphism be known beforehand. This limitation is significant, as the frequency of polymorphism in unrelated individuals is estimated to average as high as once every several hundred basepairs, e.g. Cooper et al, Human Genetics, 69: 201-205 (1985). Thus, some disease conditions or susceptibilities could depend on the interaction of and/or contributions from large numbers of genetic loci.
It would be highly desirable if there was an approach available that was particularly well suited for large-scale identification of, or surveying, polymorphic or mutated sequences in an individual.
Accordingly, my invention includes providing methods and materials for carrying out the following objectives: identifying multiple polymorphic sequences in a test DNA population, identifying genes or other sequences in a population carrying novel polymorphisms, identifying differences between two populations of DNA molecules, identifying genes having a polymorphic or mutated sequence, and determining the degree of genetic variation between a test DNA population and a reference DNA population.
My invention achieves these and other objectives by providing methods and materials for identifying member polynucleotides of a test DNA population whose nucleotide sequences differ from those of the corresponding polynucleotides of a reference DNA population. In accordance with the invention, heteroduplexes are formed between polynucleotides of the reference DNA population and those of the test DNA population. Heteroduplexes that contain mismatched base pairs are separated from those that form perfectly matched duplexes, preferably by enzymatically digesting the perfectly matched heteroduplexes and homoduplexes so that only partially double stranded mismatched heteroduplexes remain. The mismatched heteroduplexes are then used to generate amplicons which are sequenced to identify members of the test DNA population whose sequences differ from those of the corresponding members of the reference DNA population. The nature of the sequence difference between the test and reference DNAs is determined by complete sequencing of the test DNA fragment. Materials of the invention include cloning vectors which efficiently accept inserts comprising either reference DNA or test DNA, and kits including the cloning vectors of the invention.