In genomic sequencing, nucleic acid molecules of an organism are sequenced to provide sequence reads. A sequencing read is typically aligned (mapped) to a reference genome as part of determining the genome of the organism. In this manner, differences between the genome of the organism and a reference genome can be identified.
However, such mapping to the reference genome can lead to errors. The mapping to the reference genome can bias the results, thereby leading to errors. For example, insertions and deletions in a genome are very hard to map to the reference genome, and thus may be inaccurate and/or time consuming.
De novo assembly uses information from the sequence reads to align the sequence reads to each other. But, de novo assembly is typically reserved for small (local) regions of the genome that had been identified as problematic after mapping to the reference genome. The techniques used for local de novo assembly suffer drawbacks if they were applied to de novo assembly of the entire genome, or at least a substantial part of the genome.
Therefore, it is desirable to provide new techniques de novo assembly.