The essence of biology is an understanding of all of the species and their biological mechanisms. Speciation and biological function are primarily determined by the organism's DNA sequence. The development of vastly improved DNA sequence determination for personalized medicine and ecological studies could complete the revolution initiated by the Human Genome Project.
The Human Genome Project was markedly accelerated during the last several years and was completed with blistering speed. The complete nucleotide sequence of the human genome was completed in approximately 13 years (Collins et al., 2003) and the genome was published in 2003 (www.genome.gov/11006929). There are currently over 3×1010 nucleotides in public databases (www.ncbi.nlm.nih.gov/genbank/genbankstats), and genome sequences of over 185 organisms have been fully sequenced, as well as parts of the genomes of over 100,000 taxonomic species (www.integratedgenomics.com/gold).
The Human Genome Project was essentially accomplished by a reduction in the cost of DNA sequencing by three orders of magnitude. To reduce the cost by another two or three orders of magnitude, a highly integrated platform will be needed. Although the Human Genome Project took gel-based Sanger sequencing and achieved a decrease in cost and increase in throughput by over three orders of magnitude, the project was unable to develop any competitive alternative technology for genome sequencing.
Large bacterial genomes and genomes from complex organisms are often fragmented in two steps in order to simplify the assembly process. After constructing large-insert libraries, each clone is further fragmented and smaller libraries are prepared for bi-directional shotgun sequencing. Although the material cost of creating libraries is minimal, the process has proven to be laborious for large-scale genome sequencing. Therefore, alternative strategies need to be developed.