In recent years, genetic alterations which cause or contribute to many different diseases have been identified. A few of the diseases associated with genetic alterations are genetically simple and are associated with a single genetic alteration. Once the genetic alteration associated with a genetically simple disease is identified, characterization and diagnosis of the disease is relatively simple. Most phenotypic traits and diseases, however, are genetically complex. The genetic complexity can arise as a result of the interaction or disruption of multiple genes, incomplete penetrance, genetic heterogeneity, and/or environmental/random causes (phenocopy). (Lander, E. S. and Schork, N. J., Science, 265:2037-2048 (1994)). Mapping of complex traits or diseases requires that the entire genome be scanned in order to identify all genomic regions that potentially contribute to the development of that trait or disease. In general, genome wide scans are performed using polymorphic DNA markers to determine which markers segregate with a complex trait of interest. The loci which are identified as contributing to a disease can then be mapped to specific genomic regions based on the known chromosomal locations of the markers segregating with or “linked” to that trait.
Several types of DNA polymorphisms or markers occur in the human genome and can be used in genome wide scans. These include restriction fragment length polymorphisms (RFLPs), microsatellites or simple sequence length polymorphisms (SSLPs), and single nucleotide polymorphisms (SNPs).
RFLPs are single nucleotide changes (point changes or insertion/deletion changes) which alter a restriction site and thus the digestion pattern of a given segment of DNA. RFLPs were the first type of polymorphism identified and were used as a tool to construct early genetic linkage maps in humans. RFLPs are unsuitable for a large scale analysis of populations, however, because they are unreliable and not amenable to automation. RFLPs are unreliable when used to analyze genetically-related individuals, because RFLPs have only two alleles, one with the restriction site and one without and related individuals generally have the same allele on both chromosomes. Additionally, RFLPs are not amenable to automation because RFLP detection requires the use of Southern Blot techniques which are not easily automated.
Microsatellite markers or SSLPs are sequences that are repeated in tandem, with the number of repeats resulting in multiple alleles of different lengths. Microsatellite markers are useful for identifying genes involved in traits which follow simple Mendelian, monogenic patterns of inheritance. Microsatellites, however, have proven to be unsuitable for studies involving traits which follow non-Mendelian complex patterns of inheritance because microsatellites are not optimally abundant, occurring only once every few kilobases. Microsatellites also have a high mutation and recombination rate which makes them genetically unstable. Microsatellite markers are not amenable to high throughput analysis because they can only be analyzed using PCR and gel-based assays, which require a substantial investment in labor and time as well as cost.
SNPs are single base pair positions in the genome at which different sequence alternatives (alleles) exist in the population at frequencies of greater than 1%. SNPs are extremely stable and dense within the genome, but are not optimally informative because they only identify a single loci, and thus have low statistical power.