Genetic variation exists between different individuals of a species. For some organisms, a single nucleotide polymorphism (SNP) may occur every 100 basepairs, while other species may have rates greater than one change in 1000 (Sachidanandam et al., Nature 409(6822): 928-33, 2001). Small (short) nucleotide insertions and deletions may occur at similar frequencies. While such polymorphisms can complicate some forms of genetic analysis, they can also be harnessed to map the inheritance of chromosomal regions. In model organisms, SNPs have been used to map the location of mutations from genetic screens in recombinant progeny (Berger et al., Nat Genet. 29(4): 475-81, 2001; Martin et al., Genome Biol 2(9): RESEARCH 0036, E-pub Aug. 30, 2001; Wicks et al., Nat Genet. 28(2): 160-4, 2001; Stickney et al., Genome Res 12(12): 1929-34, 2002), and to identify the location of phenotypic modifiers in quantitative trait locus mapping (QTL). In humans, SNPs have been used to identify disease alleles and phenotypic modifiers in association studies (Bader, Pharmacogenomics 2(1): 11-24. 2001; Pharoah et al., Nat Rev Cancer 4(11): 850-60, 2004).
The power of using SNPs increases with the number of SNPs identified, and methods for genotyping individuals for the presence of particular SNPs have improved. In sequenced organisms, bioinformatic approaches of comparing expressed sequence tag (EST) data have yielded a wealth of potential SNPs (Marth et al., Nat Genet. 23(4): 452-6, 1999; Buetow et al., Proc Natl Acad Sci USA 98(2): 581-4, 2001; Hu et al., Pharmacogenomics J 2(4): 236-42, 2002). More recently, high-throughput approaches using high-density oligonucleotide arrays have been employed for SNP discovery (Matsuzaki et al., Genome Res 14(3): 414-25, 2004). However, these approaches can only be used to study organisms with a well-developed genomics infrastructure and prior knowledge of genome or EST sequence, at significant cost.
Likewise, high-resolution SNP maps have been generated by comparative genome sequencing of lab populations of interest, such as the common genetic screen lines FRT 82 and rucuca in Drosophila (Berger et al., Nat Genet 29(4): 475-81, 2001; Martin et al., Genome Biol 2(9): RESEARCH 0036, E-pub Aug. 30, 2001). These SNP maps are optimized for the lines tested, although some proportion of SNPs from the tested populations are expected to be present in other fly lines as well. The effort involved in creating these maps makes it unlikely that many additional lines of interest will have SNPs discovered at high density in the near future by comparative sequencing, despite the need for many lines of different genetic backgrounds for optimal isolation and recovery of mutations of interest.
A frequent objective of previous SNP discovery screens was to identify SNPs that disrupted restriction endonuclease recognition sites. Disruption of such a site allowed for low-cost and rapid genotyping of the potential SNP from different individuals, as the read-out was the differential digestion of the SNP region. More recently, the capture and sequencing of genomic regions around restriction sites has been used to sample genomes and determine areas of DNA duplication in cancer and microbial population dynamics (Wang et al., Proc Natl Acad Sci USA 99(25): 16156-61, 2002; Zabarovslca et al., Nucleic Acids Res 31(2): E5-5, 2003). In these approaches, SNPs have been confounding factors rather than the objective of the techniques, in that SNPs cause uncertainty in the assignment of the short sequence reads to their proper position in the genome. Other techniques have been used to distinguish the relatedness of individual organisms within a species (see, e.g., U.S. Pat. No. 5,713,258).
While the ability to detect nucleotide polymorphisms has improved rapidly, it is not routine to detect large number of polymorphisms between two individuals, particularly in organisms lacking thorough genomic and cDNA sequence information.