Sequencing of nucleic acids, such as deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), involves determining the order of the nucleotide bases, namely adenine, guanine, cytosine, uracil, and thymine contained within a genetic sample (e.g., DNA from a blood sample). Traditional Sanger sequencing generates a set of fragments with a common 5′ origin and base-specific 3′ termini. The 3′ termini are created by base-specific interruption of in vitro enzymatic synthesis by the incorporation of chain-terminating nucleotide analogs. The fragments to be sequenced are typically cloned into a vector (e.g., bacteriophage M13) that allows the fragment to be isolated as single-stranded DNA, although similar methods can be applied for double-stranded DNA. However procured, isolated single-stranded DNA serves as a template for DNA polymerase-catalyzed reactions. The template is primed by an oligonucleotide primer complementary to a known or engineered sequence 3′ to the sequence of interest. DNA polymerase extends the primer to copy the sequence of interest. The polymerase reactions take place in the presence of deoxyribonucleoside triphosphate analogs, 2′,3′-dideoxyribonucleoside triphosphates (ddNTPs), which terminate chain extension because they lack 3′ hydroxyl termini.
A series of fragments terminated in a particular base is generated by running the DNA polymerase reaction in the presence of equivalent concentrations of the four deoxyribonucleotide triphosphates (e.g., dCTP, dGTP, dTTP), plus a one-tenth concentration of one of the nucleotides in dideoxy form. Thus, the DNA polymerase will occasionally insert the dideoxy nucleotide adjacent to its complementary base in the target. This stops chain elongation, which results in the fragment being released from the polymerase. A series of double-stranded fragments of varying lengths is generated, with the newly synthesized strand of each fragment terminating in the selected dideoxynucleotide (e.g., ddATP), which identifies the complementary base (e.g., T) in the sequence of interest. Sites terminating in the other bases are identified by running comparable polymerase reactions with the other three dideoxy analogs. Traditionally, a radioactive label is included in the polymerization mixture. Thus, gel electrophoresis followed by radioautography can be used to generate four sequencing ladders, with each ladder specific to a particular base.
Variations of Sanger sequencing have been developed that allow for automated sequence determination. A red, blue, green or yellow fluorescent dye is attached to the 5′ end of the sequencing primers. Each of the four sequencing reactions is run with a different color primer, thereby assigning characteristic fluorescence to all the fragments terminating in a particular base. Eliminating the use of radioisotopes favors high-throughput applications as the use of fluorescent dyes allows for automated determination of the sequence reads and processing of the data.
In modern automated Sanger sequencing systems, the sequence is determined by high-resolution electrophoretic separation of the end-labeled extension products in a capillary-based polymer gel. Laser excitation of the fluorescent labels as fragments of discrete lengths exit the capillaries, in combination with four-color detection of emission spectra, provide the sequencing trace. Software translates these traces into DNA sequence and generates error probabilities for each base-call. Applications of the Sanger system can now be applied to achieve read-lengths of approximately 1000 base pairs and accuracies above 99.9%.
Automated Sanger sequencing is referred to as a “first generation” technology. For all its accomplishments, Sanger sequencing is inherently limited by the polymerization and chemistry involved, which prompted development of systems more amenable to post-genomic (e.g., short-read), high-throughput sequencing. Newer, “next-generation sequencing” (“NGS”) technologies can cheaply provide enormous volumes of sequence data (e.g., in excess of one billion short reads per sequencing runs). Thus, NGS technologies may be applied to a broad range of biological phenomena, including genetic variation, RNA expression, protein-DNA interactions, evolutionary comparisons, and chromosome conformation analyses. Current commercially available NGS technologies include Roche/454, Illumina/Solexa, Life/APG and Helicos Biosciences.