DNA is a long bio-polymer made from repeating units called nucleotides. DNA polymers can be enormous molecules containing millions of nucleotides e.g. the human genome contains a total of 3 billion nucleotides. In living organisms, DNA does not usually exist as a single molecule, but instead as a tightly-associated pair of molecules. These two long strands intertwine like vines, in the shape of a double helix. The nucleotide repeats contain both a phosphate backbone which holds the chain together, and a base, which interacts with the other DNA strand in the helix. This interaction between the bases of the two DNA strands is called hydrogen bonds and they hold the double helix together. There are four different types of bases: Adenine (A), Cytosine (C), Guanine (G) and Thymine (T). Each type of base in one strand forms a hydrogen bond with just one type of base in the complementary strand, with A bonding only to T, and C bonding only to G.
The sequence of the four bases determines the genetic information contained in DNA. Revealing the sequence of the four building blocks of polynucleic acid is called sequencing. Polynucleic acid comprises bases of nucleosides chemically bound in a linear fashion. “DNA” (De-oxyribonucleic acid) and “RNA” (Ribonucleic acid) are examples of such polynucleic acid molecules. The particular order or “sequence” of these bases in a given gene determines the structure of the protein encoded by the gene. Furthermore, the sequence of bases surrounding the gene typically contains information about how often the particular protein should be made, in which cell types etc.
The complete nucleotide sequence of all DNA polymers in a particular individual is known as that individual's “genome”. In 2003 the human genome project was finished and a draft version of the human DNA sequence was presented. It took 13 years, 3 billion US $ and the joint power of multiple sequencing centers to achieve this scientific milestone which was compared in significance to the arrival of men on the moon. The method used for this giant project is called Sanger sequencing (Sanger, F. et al., Proc. Natl. Acad. Sci. USA (1977) 74, 5463-5467 and Smith et al., U.S. Pat. No. 5,821,058). Although major technical improvements were made during this time, the classical sequencing method has some key-disadvantages:                Laborious sample preparation, including subcloning of DNA fragments in bacteria        Expensive automation        Cost prohibitive molecular biology reagents        Limited throughput which results in years to finish sequencing whole genomes        
Multiple diseases have a strong genetic component (Strittmatter, W. J. et al., Annual Review of Neuroscience 19 (1996): 53-77; Ogura, Y. et al., Nature 411, (2001): 603-606; Begovich, A. B. et al., American Journal of Human Genetics 75, (2004): 330-337). With the completion of the Human Genome Project and an ever deepening comprehension of the molecular basis of disease, medicine in the 21st century is poised for a revolution called “molecular diagnostics”. Most commercial and academic approaches in molecular diagnostics assess single nucleotide variations (SNPs) or mutations to identify DNA aberrations. These technologies, although powerful, will analyze only a small portion of the entire genome. The inability to accurately and rapidly sequence large quantities of DNA remains an important bottleneck for research and drug development (Shaffer, C., Nat Biotech 25 (2007): 149). Clearly, there is a need for the development of improved sequencing technologies that are faster, easier to use, and less expensive.