Over the past decade, DNA sequencing throughput has increased over 50-fold. Advances in DNA sequencing have revolutionized the fields of cellular and molecular biology. High-throughput sequencing platforms include the 454 FLX™ or 454 TITANIUM™ (Roche), the SOLEXA™ Genome Analyzer (Illumina), the HELISCOPE™ Single Molecule Sequencer (Helicos Biosciences), the SOLID™ DNA Sequencer (Life Technologies/Applied Biosystems) instruments), SMRT™ technology developed by Pacific Biosystems, as well as other platforms still under development by companies such as Intelligent Biosystems.
Although such sequencing platforms generate vast amounts of sequencing data including multiple reads of the same target sequence, difficulties remain in deducing correct sequences present in a sample due to errors introduced by the high-throughput sequencing methods. With the high error rate, it is difficult to identify the majority species consistently and reliably. It is even more difficult to identify the minority species that differ little from the majority species and to determine their prevalence. Most sequence alignment-based methods alone cannot overcome high frequencies of error.