The present invention relates generally to genomics analysis, and more specifically to methods for producing arrays for high throughput genomics analysis.
The task of cataloguing human genetic variation and correlating this variation with susceptibility to disease is daunting and expensive. A single genome sequence has a price tag of approximately $10-20 million using traditional methods. A drastic reduction in this cost is imperative for advancing the understanding of health and disease. The near term goal in genomics analysis is to resequence the human genome at a cost 3-4 orders of magnitude less, or about $100,000 dollars. The ultimate goal is to reduce this cost to $1000 dollars per genome. A reduction in sequencing costs to less than $100,000 per genome will require a number of technical advances in the field. Fortunately, the same basic principles of readout parallelization and sample multiplexing that proved so powerful for gene expression and SNP genotyping analysis are also being successfully applied to large-scale sequencing. Technical advances that stand to facilitate the $100,000 genome analysis, or less, include: (1) library generation; (2) highly-parallel clonal amplification and analysis; (3) development of robust cycle sequencing biochemistry; (4) development of ultrafast imaging technology; and (5) development of algorithms for sequence assembly from short reads.
The ability to specify the content of the DNA library in a targeted manner is extremely useful for a number of applications. In particular, the ability to resequence all exons in the cancer genome would greatly facilitate the discovery of new cancer genes. The comprehensive resequencing of cancer genomes is a major objective of the Cancer Genome Atlas Project (cancergenome.nih.gov/index.asp) and would greatly benefit from a reduction in sequencing price. Given the near term objective of the $100,000 genome, it should be feasible to resequence all approximately 250,000 exons in the genome for about $1000 per sample. A good method for creating a targeted library of the 250,000 exons from the genome is important. The approach of single-plex PCR for each exon is clearly cost prohibitive. As such, parallelization of the sample preparation is of paramount importance in reducing sequencing costs.
In addition to library generation, the creation of clonal amplifications in a highly-parallel manner is also essential to cost-effective sequencing. Sequencing is generally performed on clonal populations of DNA molecules traditionally prepared from plasmids grown from picking individual bacterial colonies. In the human genome project, each clone was individually picked, grown-up, and the DNA extracted or amplified out of the clone. In recent years, there have been a number of innovations to enable highly-parallelized analysis of DNA clones particularly using array-based approaches. In the simplest approach, the library can be analyzed at the single molecule level which by its very nature is clonal. Generally, DNA molecules are captured on a solid phase surface such that individual species are spatially separated from each other and distinguishable in subsequent cycles of sequencing. Current capture methods are random in nature and rely, at least in part, on precise control of conditions to allow an optimal density of DNA molecules to attach to the surface. Improper conditions can lead to overcrowding such that individual species are not distinguishable or, alternatively, high vacancy rates that can reduce the information gained per run to a level that wastes expensive sequencing reagents.
Thus, there exists a need to develop methods to improve nucleic acid capture for genomics analysis and provide more cost effective methods for sequence analysis. The present invention satisfies this need and provides related advantages as well.