Efficient and accurate interpretation of DNA variants from sequence-based tests is a challenge for clinical laboratories. This challenge is compounded by increasing test complexity due to a greater number of genes assayed per test, emerging evidence for pathogenicity, and imprecise clinical phenotypes.
Generally, a sequence-based test workflow starts when a physician orders a sequence-based test for, as an example, a patient's cancerous tumor. The sequence-based test is used to better understand that tumor and which drugs might be most effective in treating the patient. After the test is ordered, samples are collected, sequence data are generated, and DNA sequence information is generated for that cancer sample. Then, informatics and analytics are applied to determine one or more variants. A variant is a DNA change that is present in that patient's sample relative to a reference, such as a reference genome. A clinical geneticist reviews the one or more variants. In reviewing the variants, the geneticist assesses, for example, which variants are more likely to be the cause of one or more diseases or phenotypes of interest than others, which variants are pathogenic or likely pathogenic, and/or which variants are associated with modified drug response or drug toxicity. A report is then prepared based on the physician's order. For example, a lab director who is an expert in the field may sign out the test report, and the results will be sent back to the physician to help them better treat the patient.
This typical workflow suffers from several deficiencies. First, literature used to interpret the sequence results often needs to be procured and reviewed. To procure and review biomedical papers and other literature, for example, a geneticist or fellow will obtain and read the papers and interpret the different variants that are observed. However, the process between the time the test was ordered and the time the results get back to the physician can take a long time-time that could otherwise be spent treating the patient. In some instances, that time delay actually reduces the odds of successfully treating the patient's disease.
Second, there is a scalability challenge with the increasing number of sequence-based tests being ordered. It becomes more and more difficult to keep pace with test interpretation as test volumes increase. Further, as the number of tests increases, so does the number of variants and articles that are reviewed, thereby compounding the problem.
Third, the tests themselves are growing larger and more complex. Tests are changing from simpler tests that consider a handful of mutations in a gene, such as the BRCA1 or BRCA2 genes that predispose women to breast cancer, to tests that consider panels of dozens, hundreds or even thousands of genes. In some cases, labs are actually sequencing entire exomes all of the known exons of genes in a patient's genome—or even the entire genome of a patient. Such sequences have so much information in them that it results in a big data problem, where it becomes extremely challenging to interpret and pull out the relevant insights from the sequences.
Generally, entities interested in conducting clinical trials for studying variants spend a great deal of resources finding and enrolling patients for clinical trials. For example, a pharmaceutical company may be interested in studying patients having (or lacking) a particular genetic change or constellation of genetic changes, with the expectation that patients having (or lacking) those changes or variants may be expected to respond more favorably, or less favorably, to a particular therapy. The company enrolls several trial sites that test potential candidates for the genetic changes. Depending on the rarity of patients with the phenotype of interest who have (or lack) the desired variant or constellation of variants, many candidate patients may need to be tested to find a relatively small number of candidates that actually have (or lack) the desired variant or constellation of variants. There is even the possibility that enough candidates for the study are not identified to adequately power the trial.
In some cases, an article related to a variant has been published, but the publication is too recent to have been curated by the time a bibliography for a variant of interest is requested. The amount of time needed to curate an article can vary depending on the resources available for curation. For example, the time needed may be at least as long as necessary for a person to read through the article, and in many cases may be much longer. Nonetheless, the literature may contain relevant information on the particular variant of interest. If these papers are uncurated or partially curated prior to interpretation of a test, then patients may not benefit from valuable information that may be in them. In some instances, relevant information in non-curated content can be identified using textual searching techniques, such as natural language processing, or by construction of a “just-in-time” bibliography for one or more variants of interest. However, textual searching techniques on non-curated content often fail to provide results as relevant or as useful as those provided by curated content.
As for the information itself, the presence or absence of a single genomic variant is often not completely determinative of phenotypic effects. Yet only individual variants or individual DNA changes are generally being assessed, and often outside the context of the rest of the genome. For example, the ClinVar Database, run by the National Center for Biotechnology Information in the United States, provides information about the clinical significance of particular DNA changes. Yet, this mode of interpreting variants on a one-off basis, without appreciating the context of other genetic changes and modifier variants, is overly simplistic.
Another current issue in genetic testing interpretation occurs when a clinician interprets a genome for an individual's sequence-based test, and discovers a DNA change that looks extremely rare. The rarity of the change and the fact that it occurs in a gene that has been linked to a particular disease makes it compelling to conclude that the variant is causal for the rare disease phentotype affecting the patient. However, many sequencing studies that have been submitted to public domain can be extremely biased toward people of European descent. As a result, variants can be misclassified as being causal because of their scarcity in one population or ethnic group, even though they are less scarce in populations that have not had the same amount of sequencing investigation.
Generally, knowledge about particular genomic variants is continually being updated. The updates can come from clinical trials, research, regulatory approvals, experience treating patients, or other sources. However, the effect, impact, or occurrence of these updates is not always clear, even when they suggest a change to therapy or monitoring of a condition. Often, a patient may receive a diagnosis based on having a particular genomic variant, but is not made aware of subsequent developments in the understanding of the genomic variant.
In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.