Genetic information is stored in the form of very long molecules of deoxyribonucleic acid (DNA), organized into chromosomes. The human genome contains approximately three billion bases of DNA sequence. This DNA sequence information determines multiple characteristics of each individual. Many common diseases are based at least in part on variations in DNA sequence.
Determination of the entire sequence of the human genome has provided a foundation for identifying the genetic basis of such diseases. However, a great deal of work remains to be done to identify the genetic variations associated with each disease. That would require DNA sequencing of portions of chromosomes in individuals or families exhibiting each such disease, in order to identify specific changes in DNA sequence that promote the disease. Ribonucleic acid (RNA), an intermediary molecule in processing genetic information, may also be sequenced to identify the genetic bases of various diseases.
Existing methods for nucleic acid sequencing, based on detection of fluorescently labeled nucleic acids that have been separated by size, are limited by the length of the nucleic acid that can be sequenced. Typically, only 500 to 1,000 bases of nucleic acid sequence can be determined at one time. This is much shorter than the length of the functional unit of DNA, referred to as a gene, which can be tens or even hundreds of thousands of bases in length. Using current methods, determination of a complete gene sequence requires that many copies of the gene be produced, cut into overlapping fragments and sequenced, after which the overlapping DNA sequences may be assembled into the complete gene. This process is laborious, expensive, inefficient and time-consuming.