The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Current state-of-the-art genome sequencing machines do not, as one might expect, produce one continuous output sequence of the entire genome. Rather, they generate large numbers of relatively short fragments of sequence called reads, which range from dozens to thousands of base pairs in length. Because these reads are output by the machine in no particular order, the first step in analyzing the data in prior approaches is typically to map each read to a position on the reference genome with which the read is associated. This is called alignment. The second step in prior approaches is typically to sort these reads by their mapped positions. Genome sequencing produces large quantities of data that can take hours or days to align and sort, so prior approaches can be improved by eliminating or making more efficient the steps in this analysis.