Polymer sequencing, such as protein sequencing or DNA sequencing, is susceptible to errors, misalignment and other issues related to inconsistent synthesis and/or cleavage as associated with polymers undergoing analysis. These issues can result in inaccurate analysis and other related issues.
One approach to synthesis-based sequencing is DNA sequencing-by-synthesis, which generally involves biochemical processes by which a target DNA strand is iteratively built up to determine its sequence. An example of such methods is Pyrosequencing, which offers the promise of high throughput and low cost sequencing via efficient mass parallelization and its relative simplicity. However, achieving reliable DNA read lengths using Pyrosequencing requires high reagent costs to maximize signal fidelity and thereby has often been prohibitively expensive for applications such as whole genome shot-gun assembly.
Recently, there has been great interest in developing cheaper, higher throughput de novo DNA sequencing technologies and protocols to jumpstart the next phase of genetic inquiry beyond the human genome project. Although much progress has been made in the last two decades, whereby sequencing cost has been reduced from tens of dollars per base to a few cents per base today, the cost of sequencing a single mammalian-sized genome is still in the tens of millions of dollars. This exorbitant cost greatly hinders such vital studies as comparative genomic analysis across species, detailed studies of human genetic variation, and analyses of difficult-to-culture microbial communities.
Microarray-based technologies can be used for single-nucleotide polymorphism (SNP) analysis; however, these genotyping methods are likely to miss rare differences that may be critical to diagnosing certain conditions as well as fail to extract long-range information, such as genomic rearrangements.
One sequencing approach is Sanger sequencing, which uses fluorescence detection of dideoxy terminated fragments resolved by capillary array electrophoresis (CAE). See, e.g., B. Ewing, P. Green, “Basecalling of automated sequencer traces using phred. II. Error probabilities,” Genome Research 8, pp. 186-194, 1998. This method, first developed in 1977, now allows the sequencing of read segments of approximately 1000 nucleotides long with reasonable accuracy. However, despite the past and ongoing enhancements to existing CAE technology, under this regime, the lower limit in terms of cost per mammalian genome is intolerably excessive for many applications.
These and other characteristics have been challenging to polymer sequencing and related applications, for both sequencing-by-synthesis and cleavage approaches, for a variety of polymers such as proteins and DNA.