Single-molecule sequencing enables molecules such as DNA, RNA, and peptides to be sequenced directly from biological samples without steps such as purification, separation, amplification of the molecules themselves. Single-molecule sequencing is thus well-suited for diagnostic and clinical applications.
The classical DNA sequencing technology (sometimes referred to as first generation sequencing technology) was developed in the late 1970s and evolved from a low-throughput approach, in which the same radiolabeled DNA sample was run on a gel with one lane for each nucleotide, to an automated method in which all four fluorescently labeled dye terminators for a single sample were loaded onto individual capillaries. These capillary-based instruments could handle hundreds of individual samples per week and were used in obtaining the first draft sequence of a human genome. Various improvements in components used in this technology pushed read lengths up to 1,000 base pairs (bp) without much improvement on the underlying principle.
The second generation sequencing technology emerged in 2005 and increases the throughput by at least two orders of magnitude over the first generation sequencing technology. Representative platforms include pyrosequencing (454 Life Sciences), Solexa (Illumina) and SOLiD (Applied Biosystems). The second generation sequencing technology is superior to its predecessor because the sequencing target changed from single clones or samples to many independent DNA fragments, enabling large sets of DNAs to be sequenced in parallel. Many platforms in this generation achieved massively parallel sequencing by imaging light emission from the sequenced DNA, or by detecting hydrogen ions (Ion Torrent by Life Technologies). The second generation sequencing technology avoids the bottleneck that resulted from the individual preparation of DNA templates required in the first generation technology. Read lengths of the second generation sequencing technology have exceeded 400 bp at an error rate below 1%.
The second generation sequencing technology still requires amplification of template. Amplification may cause quantitative and qualitative artifacts that can have detrimental impacts on quantitative applications, such as chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA/cDNA sequencing. Amplification also places limitations on the size of the template being sequenced because molecules that are too short or too long tend not to be amplified well.
The third generation sequencing technology allows sequencing one or a few copies of a molecule and thus is often referred to as the single-molecule sequencing technology. The third generation sequencing technology thus simplifies sample preparation, reduces sample mass requirements, and most importantly eliminates amplification of templates. The third generation sequencing technology tends to have high read lengths, low error rates and high throughput. The third generation sequencing technology allows resequencing the same molecule multiple times for improved accuracy and sequencing molecules that cannot be readily amplified, for example because of extremes of guanine-cytosine content, secondary structure, or other reasons. These characteristics of the third generation sequencing technology make it well suited for diagnostic and clinical applications.
The third generation sequencing technology encompasses a wide variety of platforms that differ in their fundamental principles. Representative platforms include sequencing by synthesis, optical sequencing and mapping, and nanopores.
Sequencing by Synthesis
One representative sequencing-by-synthesis platform involves hybridizing individual molecules to a flow cell surface containing covalently attached oligonucleotides, sequentially adding fluorescently labeled nucleotides and a DNA polymerase, detecting incorporation events by laser excitation, and recording with a charge coupled device (CCD) camera. The fluorescent nucleotide prevents the incorporation of any subsequent nucleotide until the nucleotide dye moiety is cleaved. The images from each cycle are assembled to generate an overall set of sequence reads.
Another representative sequencing-by-synthesis platform involves constraining DNA to a zero-mode wave guide so small that light can penetrate only the region very close to the edge of the wave guide, where the polymerase used for sequencing is constrained. Only nucleotides in that small volume near the polymerase can be illuminated and their fluorescence can be detected. All four potential nucleotides are included in the reaction, each labeled with a different color fluorescent dye so that they can be distinguished from each other.
Yet another representative sequencing-by-synthesis platform is based on the fluorescence resonance energy transfer (FRET). This platform uses a quantum-dot-labeled polymerase that synthesizes DNA and four distinctly labeled nucleotides in a real-time system. Quantum dots, which are fluorescent semiconducting nanoparticles, have an advantage over fluorescent dyes in that they are much brighter and less susceptible to bleaching, although they are also much larger and more susceptible to blinking. The sample to be sequenced is ligated to a surface-attached oligonucleotide of defined sequence and then read by extension of a primer complementary to the surface oligonucleotide. When a fluorescently labeled nucleotide binds to the polymerase, it interacts with the quantum dot, causing an alteration in the fluorescence of both the nucleotide and the quantum dot. The quantum dot signal drops, whereas a signal from the dye-labeled phosphate on each nucleotide rises at a characteristic wavelength.
Optical Sequencing and Mapping
Optical sequencing and mapping generally involves immobilizing a DNA molecule to be sequenced to a surface, cutting it with various restriction enzymes or labeling it after treatment with sequence-specific nicking enzymes.
Nanopores
Sequencing by synthesis and optical sequencing and mapping platforms use some kind of label to detect the individual base for sequencing. In contrast, nanopore platforms generally do not require an exogenous label but rely instead on the electronic or chemical structure of the different nucleotides for discrimination. Representative nanopores include those based on solid-state materials such as carbon nanotubes or thin films and those based on biological materials such as α-hemolysin or MspA.