The present invention relates to transmission of genetic data, and more specifically to transmission and compression of genetic data.
DNA gene sequencing of a human, for example, generates about 3 billion (3×109) nucleotide bases. Currently all 3 billion nucleotide base pairs are transmitted, stored and analyzed, with each base pair typically represented as two bits. The storage of the data associated with the sequencing is significantly large, requiring at least 3 gigabytes of computer data storage space to store the entire genome which includes only nucleotide sequenced data and no other data or information such as annotations. If the entire genome included other information, such as annotations, the genome may require terabytes worth of storage. The movement of the data between institutions, laboratories and research facilities is hindered by the significantly large amount of data, the significant amount of storage necessary to contain the data, and the resources necessary to directly transmit the data. For example, some research facilities can spend upwards of $2 million dollars for transmitting genetic data and sending genetic data that is large, for example terabytes of data that includes annotations and specifics regarding the genetic sequence or genome. The transfer of genetic sequence that is very large can take a significant amount of time over a network data processing system.
FIG. 2 shows an overview of conventional transmission between a source and a destination of a genome. An uncompressed genome at a source 600 is read from a repository 606 by a disk controller 606. The uncompressed genome is then moved to memory 604. A processor 602 runs an algorithm to compress the genome. An output from the processor 602 of a compressed genome is then sent to a network interface controller (NIC) 610. The NIC controller 610 of the source 600 sends the compressed genome through a network to a NIC 622 at a destination 612. The compressed genome that was received by the NIC 622 at the destination 612 is sent to memory 616. A processor 614 at the destination 612 then runs an algorithm to decompress the compressed genome and stores the decompressed genome to memory 616. From memory 616, the decompressed genome is moved to a repository 620 by a disk controller 618 at the destination 612.