The advent of the first reference sequence of the human genome by Lander (Nature (15 Feb. 2001) 409: 860-921) and Venter (Science (16 Feb. 2001) 291:1304) has generated increased interest in the ability to sequence entire genomes which range in size from ˜1 megabase to as high as 600 gigabases in some organisms. To make uses of the human genome reference sequence tractable, several important innovations including highly parallel capillary electrophoresis were developed in order to bring base sequencing costs down.
Unfortunately, to go beyond a small number of reference sequences to the point where it is feasible to sequence each individual genome in a population or to sequence ab initio a large class of new organisms, vastly faster and less expensive means are required. Towards that end several new approaches have been proposed and demonstrated to various degrees including: Edman degradation and fluorescent dye labeling of a single DNA strand in a flow cytometer; “sequencing by hybridization” (Perlegen Corp.; Callida Genomics) and “sequencing by synthesis” (Quake et al., Cal. Tech.; Solexa Corp.). The latter two approaches afford a high degree of parallel information retrieval, leveraging chip-based imaging system approaches to simultaneously record data from a very large number of gene chip pixels. Unfortunately, each approach suffers several shortcomings.
“Sequencing by hybridization” requires an inordinately large array of explicitly patterned spots to approximate the information content of the genome to be sequenced. In addition, oligonucleotides from the sample are required to search a very large array of gene chip oligonucleotide complements, making hybridization time very long.
“Sequencing by synthesis” avoids a number of the issues above but introduces difficult and error-prone chemistries which may be difficult to scale effectively.