Genes are the functional unit of human biology and are encoded in DNA sequence. Collectively, the sequence of all DNA including all genes from any individual is called a genome. Recent technological advances have allowed researchers to determine the sequence of entire genomes rapidly and inexpensively, which is revolutionizing the process of discovery in biomedical research and paving the way for the implementation of personalized medicine in clinical practice.
The sequencing of genomes in individual patients can yield important information regarding disease states, diagnoses, prognostics, and various treatment options. Information contained in genome sequencing data is usually vast and complex. However, many medical professionals (e.g., physicians) are primarily concerned with specific clinical questions and thus would like to have targeted information with regard to identified symptoms or suspected diseases. Accordingly, the ability to quickly determine the most clinically or biologically relevant information in the genome sequencing data will allow medical professionals to more quickly provide patients with individualized diagnosis and treatment of diseases.
Interpreting information in the genome sequencing data generally entails relating the information to established genomic data found in medical literature sources. However, this discovery process can be rather tedious and time-consuming, and often requires the expertise of highly-trained experts. Various attempts have been made to automate this process, but there still lacks a widely accepted technique or tool that can effectively and efficiently harness relevant genomic data from existing medical literature sources.