The advent of the human genome project required that improved methods for sequencing nucleic acids, such as DNA (deoxyribonucleic acid) and RNA (ribonucleic acid), be developed. Many common diseases, such as cancer, cystic fibrosis and sickle cell anemia, are based at least in part on variations in DNA sequence. Determination of the entire 3,000,000,000 base sequence of the human genome has provided a foundation for identifying the genetic basis of such diseases. However, a great deal of work remains to be done to identify the genetic variations associated with each disease.
Existing methods for nucleic acid sequencing, based on detection of labeled nucleic acids that have been separated by size, are limited by the length of the nucleic acid that can be sequenced. Typically, only 500 to 1,000 bases of nucleic acid sequence can be determined at one time. This is much shorter than the length of the functional unit of DNA, referred to as a gene, which can be tens or even hundreds of thousands of bases in length. Using current methods, determination of a complete gene sequence requires that many copies of the gene be produced, cut into overlapping fragments and sequenced, after which the overlapping DNA sequences may be assembled into the complete gene. This process is laborious, expensive, inefficient and time-consuming.
More recent methods of nucleic acid sequencing, involving hybridization to oligonucleotide arrays of known sequences at specific locations on a chip, may be used to infer short nucleic acid sequences or to detect the presence of a specific nucleic acid in a sample. However, they are not suited for identifying long nucleic acid sequences.