It is increasingly realized that single molecule analysis techniques provide a depth of analysis not possible with traditional ensemble molecular methods. Gel analysis, microarray analysis and other methods that are commonly used in biochemistry and molecular biology take averaged readings of thousands of molecules. The target molecules typically have to be purified in substantial quantities (e.g., proteins) or must be amplified by PCR (e.g., DNA).
Massively parallel clonal DNA sequencing (e.g., Illumina sequencing) has revolutionized the way we conduct much of the tasks in molecular biology. However there remain a number of drawbacks. The sample material has to be processed and amplified before it can be sequenced. Due to PCR bias, coverage across the genome is not even, which makes de novo assembly challenging for large complex genomes. Typically, the molecules that are analyzed are not long enough length to detect structural variation (SV) and haplotypes are not resolved. Moreover, it is not currently possible to sequence molecules from a single cell without amplification. In the case of proteins, it is challenging to decipher the identity of molecules within a complex mixture unless they are at high abundance. There are a number of methods for analyzing single molecules, including those that require labeling and those that do not. Fluorescence labeling and optical detection has been used as a means for sequencing DNA at the single molecule level (Helicos; Pacbio). An advantage of optical methods is that a large number of single molecules arrayed on a surface can be analyzed in parallel. Due to the diffraction limit of light, the single molecules need to be arrayed at a density that enables individual molecules to be resolved; for example for a fluorophore emitting at 600 nm, the distance is typically 300 nm. In the currently dominant sequencing technology (Bentley, Illumina), such well-spaced single molecules can be amplified in situ to produce clonal clusters which are then sequenced by monitoring the template-directed incorporation of fluorescent nucleotides (sequencing-by-synthesis; SbS). One disadvantage of this approach is that it is not possible to keep all the molecules in synchrony (or in phase) and as the number of cycles increases the errors accumulate. However, the arrayed single molecules need not be amplified and sequencing can be conducted directly on the single molecules, as is done using the Helicos technology [Harris et al], Oxford nanopores technology, PacBio technology [Eid et al]. While such sequencing of single molecules directly does not suffer from the phasing problem, it can be compromised by the photophysics of individual fluorophores.
While it is possible to sequence single molecules of nucleic acids, there are apparently no single molecule methods for sequencing proteins. Moreover, there are no amplification methods for proteins and proteins must be relatively pure and not part of a highly complex mixture in order to be analyzed.
From the foregoing it is clear that although progress has been made, there are a number of deficiencies in the technologies that represent the state of the art.