The task of cataloguing human genetic variation and correlating this variation with susceptibility to disease is daunting and expensive. A single genome sequence has a price tag of approximately $50,000. A drastic reduction in this cost is imperative for advancing the understanding of health and disease. The near term goal in genomics analysis is to resequence the human genome at a cost of approximately $1,000 dollars. A reduction in sequencing costs will require a number of technical advances in the field. Fortunately, the same basic principles of readout parallelization and sample multiplexing that proved so powerful for gene expression and SNP genotyping analysis are also being successfully applied to large-scale sequencing. Technical advances that could reduce the cost of genome analysis include: (1) library generation; (2) highly-parallel clonal amplification and analysis; (3) development of robust cycle sequencing biochemistry; (4) development of ultrafast imaging technology; and (5) development of algorithms for sequence assembly from short reads.
The ability to specify the content of the DNA library in a targeted manner is extremely useful for a number of applications. In particular, the ability to resequence all exons in the cancer genome would greatly facilitate the discovery of new cancer genes. The comprehensive resequencing of cancer genomes is a major objective of the Cancer Genome Atlas Project (cancergenome.nih.gov/index.asp) and would greatly benefit from a reduction in sequencing price. Unfortunately, creating a targeted library of the 250,000 exons from the genome is cumbersome using current methods. The approach of single-plex PCR for each exon is clearly cost prohibitive. As such, parallelization of the sample preparation is of paramount importance in reducing sequencing costs.
In addition to library generation, the creation of clonal amplifications in a highly-parallel manner is also important for cost-effective sequencing. Sequencing is generally performed on clonal populations of DNA molecules traditionally prepared from plasmids grown from picking individual bacterial colonies. In the human genome project, each clone was individually picked, grown-up, and the DNA extracted or amplified out of the clone. In recent years, there have been a number of innovations to enable highly-parallelized analysis of DNA clones particularly using array-based approaches. In the simplest approach, the library can be analyzed at the single molecule level which by its very nature is clonal. The major advantage of single molecule sequencing is that cyclic sequencing can occur asynchronously since each molecule is read out individually. In contrast, analysis of clonal amplifications requires near quantitative completion of each sequencing cycle, otherwise background noise progressively grows with each ensuing cycle severely limiting read length. As such, clonal analysis places a bigger burden on the robustness of the sequencing biochemistry and may potentially limit read lengths.
Thus, there exists a need to develop methods to improve genomics analysis and provide more cost effective methods for sequence analysis. The present invention satisfies this need and provides related advantages as well.