As the genes involved in various aspects of human physiology are elucidated, there are increasingly more candidate genes associated with disease. The application of this knowledge both in the clinic and to clinical research can be very powerful as the field moves toward personalized medicine. Examples of success include the sequencing of candidate disease loci in targeted populations, such as Ashkenazi Jews (Weinstein 2007), the sequencing of variants in drug metabolism genes to adjust dosage (Marsh and McLeod 2006), and the identification of genetic defects in cancer that make tumors more responsive to certain treatments (Marsh and McLeod 2006). However, the sequencing of many candidate genes across many individual samples necessitates the development of new technology to lower the cost and increase the throughput of medical re-sequencing to make clinical application more feasible.
The cost of sequencing is declining rapidly due to second generation sequencing technologies that perform a large number of sequencing reactions in parallel while using a small amount of reagent per reaction (Metzker 2005). These technologies integrate cloning and amplification into the sequencing protocol, which is essential for achieving the greater than 100-fold cost savings over traditional methods. However, this integration results in a loss of flexibility—it is not yet feasible to sequence a subset of the human genome in a large number of samples for the same cost as sequencing the complete genome of a single individual. This is a limitation, because sequencing the complete genome of a large numbers of individuals is still cost prohibitive, and the whole genome sequence of only a few individuals does not provide enough statistical power to make correlations between genotype and phenotype. The promise of personalized medicine based on genome analysis still glows on the horizon, but the significance behind observed variability is dim without an affordable technology to drive the necessary depth of patient sampling.
Current methods for analyzing sequence variation in a subset of the human genome rely on PCR to amplify the targeted sequences (Greenman et al. 2007; Sjoblom et al. 2006; Wood et al. 2007). Efforts to multiplex PCR have been hampered by the dramatic increase in the amplification of mispriming events as more primer pairs are used (Fan et al. 2006). In addition, large numbers of primer pairs often result in inter-primer interactions that prevent amplification (Han et al. 2006). Therefore, separate PCRs for each region of interest are performed, a costly approach when hundreds of individual PCRs must be performed for each sample (Greenman et al. 2007; Sjoblom et al. 2006; Wood et al. 2007). Furthermore, this strategy requires a large amount of starting DNA to supply enough template for all of the individual PCR reactions. This can be a problem as DNA is often a limiting factor when working with clinical samples.
It is important to choose the appropriate strategy for sample tracking to fully harness the throughput of second generation sequencing technologies. The sequencing capacities of these platforms are large enough that multiple samples can be sequenced with a single instrument run. To do this, one can use a separate compartment for each sample, but this only allows for a small number of samples, and there is a reduction in the total amount of sequence generated per run. Recently, Parameswaran et al. (Parameswaran et al. 2007) demonstrated the power of using DNA barcodes to label samples so that they can be pooled and sequenced together on the 454/Roche GS20 Sequencer. They were able to utilize the full capacity of the instrument and still determine from which sample each read originated. To realize the full power of second generation sequencing technologies, a multiplexing strategy should be compatible with DNA barcoding to track samples.
Therefore, there remains a need in the art for a multiplexed PCR method that simultaneously amplifies many targeted regions from a small amount of nucleic acid. The PCR method should also be compatible with next generation high throughput sequencing technologies where numerous samples can be processed in a single run. The PCR method should be specific and sensitive enough for identifying SNPs and mutations in individual samples.