Genome sequencing is a field of active research today. An understanding of the genome variation may enable researchers to fully understand the issues of genetic susceptibility and pharmacogenomics of drug response for all individuals as well as personalized molecular diagnostic tests. For such research or medical purposes, genetic material obtained directly from either a biological or an environmental sample is generally sequenced into a plurality of sequences, called genomic sequences. A facility, such as a research laboratory or a clinic involved in genomic study typically uses high capacity platforms, such as next generation sequencing (NGS) platforms capable of generating large number of genome sequences per year. The genomic sequence thus generated may be further processed and assembled into sets called contigs. Generally, the genomic sequences, or contigs, may be stored for future studies for further analysis. Thus, each year, genomic data, such as the genomic sequence and/or the contigs are generated in huge volumes, in the range of hundreds of terabytes (TB), and stored in the repositories.
Typically the genomic data is either archived in repositories, for example, individual repositories associated with laboratories generating the genomic data or public sequence repositories using various data formats, such as fastq and gff. Storage of such huge volumes of data requires the repositories to have large storage disks having huge volumes of storage capacity. Further, with the advances in the research, the genomic data may also increase, thereby increasing maintenance costs and requirements for additional storage space. Furthermore, since the genomic data may be utilized for future references, the genomic data, for instance, in the fastq format may be archived in compressed form so as to decompress or retrieve the same without any loss of information.