1. Field of the Invention
The invention relates to methods and apparatuses, and in particular relates to methods and apparatuses for compressing nucleotide sequence data.
2. Description of the Related Art
As the continuing development of single molecule DNA sequencing techniques, we can expect that enormous and growing amount of DNA read sequences are being generated. In order to achieve higher accuracy of DNA sequencing, sequencer generates dozens, maybe hundreds of raw sequences for each and all DNA segments.
According to a sequencing-by-synthesis DNA sequencer, DNA segments are synthesized in multiple copies by way of using a designed primer and polymerase. Then the fluorescent signals are received by the sensors, identified by the algorithm and output the result A, C, G, and T.
There are more than 3 billion DNA base pairs in a human genome. If raw read sequences generated by a sequencer in each genome sequencing process are more than ten times, even hundred times of a genome, as previously described, there will be an enormous amount of sequencing data. Intuitively, such huge amount of data definitely causes the burden in preserving, computation, as well as communication. We will have to downsize these data as the first thing in a high throughput processing system.
When running a DNA sequencing process, there are continuous DNA synthesis signals coming out from the detecting devices. These signals are being translated to DNA data in real-time devices. None of these signal or data can afford lost or missing in between. There is a need for efficient and convenient compression algorithm.
The path of the starting sequencing signals to raw sequences in storage includes three stages: (1) sequencing signals to raw data, (2) raw data to raw sequence in data storage, (3) raw sequence to sequence in computer storage and vice versa. Nowadays, personal computer communicates these data with its peripherals in a speed between 100 M bytes to some Giga bytes per second. The critical point in the path would be obviously in the interface between PC and its peripherals or devices.