Recent advances in sequencing had led to a wealth of genomic and sub-genomic data for both individual organisms and tissues of an organism as well as for distinct populations and even species. This has spurred the development of genome-based personalized treatment or diagnosis of various diseases, prognosis/risk assessment, and even treatment response prediction using genomic, transcriptional, and/or epigenetic information.
As the amount of genomic data has reached significant levels, computational requirement and manners of meaningful output generation have become challenging. For example, multiple tumor and matched normal whole genome sequences are now available from projects like ‘The Cancer Genome Atlas’ (TCGA) and extraction of relevant information is difficult. This is further compounded by the need for high genome sequencing coverage (for example, greater than 30-fold) to so obtain statistically relevant data. Even in compressed form, genomic information can be often reach hundreds of gigabytes, and an analysis comparing multiple of such large datasets is in most cases slow and difficult to manage, however, absolutely necessary in order to discover the many genomic changes that occurred in any given sample relative to a second sample.
Therefore, even though numerous systems and methods of comparative genomic analysis known in the art, all or all of them suffer from one or more disadvantage. Consequently, there is still a need for improved systems and methods of comparative genomic analysis.