Nucleic acid sequencing determines an order of nucleotides present in a given DNA or RNA molecule. The demand for cheaper and faster sequencing methods has driven the development of next generation sequencing (NGS) methods. NGS platforms perform massively parallel sequencing, during which millions of fragments of DNA from multiple samples can be sequenced in unison, thus providing a much cheaper and higher throughput alternative to traditional Sanger sequencing. NGS can be used in whole-genome sequencing or targeted sequencing. With targeted sequencing, a subset of genes or defined regions in a genome are sequenced or predominantly sequenced, e.g., by amplifying target regions.
Ultra-deep sequencing is the sequencing of amplicons at a high depth of coverage with the goal of identifying the common and rare sequence variations. With sufficient depth of coverage, ultra-deep sequencing has the ability to fully characterize rare sequence variants down to less than 1%. Ultra-deep sequencing has been used to detect low-frequency HIV drug-resistant mutations, or identify rare somatic mutations in complex cancer samples. For tests such as non-invasive blood tests, the frequency of biomarker mutation could be lower than 1%. However, NGS is an error-prone process, and could have an error rate of close to 1% or higher depending on the sequencing depth, sample types, and sequencing protocols. Therefore, many current NGS software packages only report variants with 1% or higher frequency because false positives could appear for variants with frequencies of less than 1%. Yet, even for variants with low frequencies of, for example, less than 1%, true positives may exist. Methods and systems are therefore needed to detect true positives for variants with frequencies of less than about 1%.