1. Field of the Invention
The invention generally relates to single nucleotide polymorphism (SNP) genotyping. In particular, the invention provides a method for SNP genotyping based on a nested PCR design that creates structures directly suitable for both DNA sequencing and ligation reactions.
2. Background of the Invention
With the completion of the rough draft of the human genome, over 1.5 million non-redundant SNPs have been identified and mapped by the publicly funded project and the SNP consortium (Sachidanandam et al, 2001). The availability of such a large number of SNPs has prompted great interest in SNP related applications and research, such as large scale linkage, genomic association and pharmarcogenetic studies.
SNPs entered the central arena of genetics due to their abundance (Collins et al, 1997; Sachidanandam et al, 2001), stability (Sachidanandam et al, 2001) and the relative ease (Kwok 2000; Shi 2001; Syvanen 2002) with which they are genotyped. The targets of SNP genotyping are normally small pieces of DNA, ranging from 150-300 base pairs. Based on the SNP consortium report, the average SNP density in the human genome appears to be greater than 1 SNP/KB. This high density provides a tool to analyze genomic structure at very high resolution, and to establish a direct correlation between genetic coding variation and biological function. The relative stability of SNPs (compared to mini-and micro-satellite markers) in evolution makes it simpler to carry out this task.
Although SNP genotyping is generally easier than microsatellite genotyping, and despite the rapid and significant advances in SNP genotyping technology in the last few years, major issues remain unsolved. Whereas only a few years ago, the main concern was the need to have a large number of SNPs available, currently the most urgent issues are increasing throughput and decreasing cost. For example, for large scale applications as projected by Kruglyak (1999) and Long (1999), a reasonable cost would be <$0.10/genotype, or else large-scale projects will become prohibitively expensive. To fulfill the goals of understanding the genetics of complex traits and common diseases, cost-effective and higher throughput methodology in SNP genotyping is essential.
Typically, genotyping protocols have three components: target amplification, allelic discrimination and product detection and identification. They are normally executed sequentially, but can be processed in a single step reaction, depending on the means which are used for allele discrimination and signal detection. Such single step procedures are well-suited to automation, but are not necessarily effective in terms of cost and throughput.
PCR is the dominant procedure utilized for target amplification. As is widely known, a PCR can readily amplify targets present in lower copy number by a factor of 108 or more. With respect to allele discrimination, all SNP genotyping technologies currently available are based on mechanisms involving one or more of the following: DNA polymerase, hybridization and DNA ligase. DNA polymerase methods include single base extension (SBE) (Armstrong, 2000; Barta 2001; Bray 2001; Cai 2001; Chen 1997; Chem 1999; Chen 1998; Chen 1997; Chen 1997; Fan 2000; Lindblad-Toh 2000; Nikiforov 1994; Pastinen 1996; Pastinen 2000; Ross, 1998; Sauer 2000; Syvanen 1999; Ye 2001), de novo sequencing including pyrosequencing (Nordstrom 2000; Ronaghi 2001), allele-specific PCR and extension (Germer 1999; Myakishev 2001; Pastinen 2000), and structure-specific cleavage (Fors 2000). Of these different forms, SBE is the most widely used and has been adapted for many different detection platforms. Hybridization is also widely used in several different forms, including dynamic hybridization (Prince 2001), and is the primary method currently used in all microarray detection formats. DNA ligase methods are based on the ability of DNA ligase to join the ends of two oligonucleotides annealed next to each other on a template (Tong 2000; Tong1999). Two oligonucleotides can be designed to anneal to both sides of a SNP site, and by detecting the formation of ligation product, the genotype of a target can be inferred (Chen 1998; Delahunty 1996; Gerry 1999; Iannone 2000).
Once a target DNA sequence has been amplified and the allelic variants discriminated, the next step in a generic genotyping protocol is to detect and identify the allele specific products. Detection mechanisms vary greatly, from simple fluorescence intensity (Armstrong 2000; Cai 2000; Chen 1998; Chen 1997; Delahunty 1996; Dubertret 2001; Fan 2000; Fergusoon 2000; Fors 2000; Germer 1999; Germer 2000; Gerry 1999; Iannone 2000; Lindblad-Toh 2000; Lindroos 2001; Marras 1999; Medintz 2001; Myakishev 2001; Nikiforov 1994; Pastinen 1997; Pastinen 2000; Prince 2001; Syvanen 1999; Ye 2001) to very precise mass (Bray 2001; Ross 1998; Sauer 2000) or electric charge (Gilles 1999; Woolley 2000) measurement. The detection mechanisms roughly fall into two categories: homogeneous and solid phase mediated detection. All homogeneous detection platforms depend on measuring fluorescence intensities and their change during and/or after reactions. One common feature among homogeneous approaches is that they do not require any separation/purification prior to signal acquisition, making them amenable to automation. Solid support techniques include flow cytometry genotyping (Armstrong 2000; Cai 2000; Ye 2001), zip-code microarrays (Fan 2000; Gerry 1999), and mass spectrometry genotyping (Bray 2001; Ross 1998; Sauer 2000). Using a solid support in the detection step may potentially increase throughput and reduce cost, but unfortunately also entails the risk of complicating protocols and compromising data quality. For example, when reaction mixtures are applied to a solid support, unintended binding of fluorescence dye to the solid surface can occur, necessitating extensive washing to minimize spurious signals.
With respect to cost, prices for genotyping technologies currently available on the market vary considerably, from $0.50-2.00/genotype. For example, the template-directed dye-terminator incorporation assay with fluorescence detection (FP-TDI) (Chen 1999) costs roughly $0.50/genotype for reagents by list price. For other methods, due to their requirement for special reagents and/or clean-up procedures, the cost is higher, e.g. MALDI-TOF mass spectrometry is about $0.75-1.00, whereas pyrosequencing approaches $2.00/genotype. No currently available technology approaches the low cost which is necessary in order to prevent large-scale projects from becoming prohibitively expensive, i.e. <$0.10/genotype.
There are currently many genotyping applications in use or under development that require a relatively large number of both SNPs and samples (e.g., both in the thousands or more), demanding both cost effective and high throughput technologies. Examples include gene mapping studies by linkage or linkage disequilibrium (Kruglyak 1999; Long 1999) for complex traits and pharmacogenetics (Riley 2000). Clearly, these applications are critical to improving our understanding of the genetics of complex traits, common diseases and drug response, and to help attain the goal of individualized medicine. However, none of the current approaches are suitable for such applications in terms of cost effectiveness coupled with high throughput potential, and SNP genotyping demands both.