A major challenge faced by researches in genomics worldwide is determining how to analyze the enormous amount of data provided by next generation sequencing machines. It is challenging to process the data and translate it into clinically relevant information that can be used in the diagnosis and therapy of diseases. The genome in its entirety is still not fully understood. One challenge is determining the effects of variants of unknown significance.
Several, hundreds, or even thousands of variants of a sequence for a gene may exist. A significant portion of these variants may not have an effect on the function or expression of the gene. However, some of the variants may have an impact on a disease state such as cancer. These variants may be useful for determining which therapies may be effective for treating a particular disease (e.g., a tumor exhibiting a particular variant may be more susceptible to a drug than tumors without the variant) and/or for finding targets for new therapies. Determining which variants may be useful for research and/or treatment and which do not have a significant effect remains a challenge.
Various algorithms have been developed to predict the functional effect of variants. Each algorithm has its own scores to categorize variants in classes such as Benign, Deleterious, Potentially deleterious, Tolerant etc. Each of these algorithms has its metrics to classify variants into the categories mentioned above. However, there is discrepancy in the results of these algorithms.