Science is a cumulative task, where new knowledge is always built upon prior knowledge. Science production, in the form of conference proceedings, presentations and scientific articles is constantly expanding. A resulting information overload requires updated computer-based tools to make knowledge accessible and, most importantly, searchable and interpretable. The goal of making information accessible and reusable for future research requires understanding not only text but also graphical information.
Specific fields, including personalized medicine, drug discovery, pharmacovigilance (e.g., drug safety) and systems biology make intensive use of graphical information to provide added value to written text in scientific publications. Molecule pathway diagrams are one major way used by scientists aiming at summarizing, describing and representing complex relationships between various biological entities. The term molecular pathway is used in this document as a common denomination of metabolic pathways, signal transduction pathway, regulatory networks or genetic pathway, among others. In general, a molecular pathway diagram is a graphical representation of any actions, changes, relations and interactions between the phenotype of a living organism, genes, RNA, proteins, drugs or other molecules.
Molecular pathway diagrams comprising extremely valuable information for researchers may be integrated into searchable databases. These databases may be built and enhanced with the assistance of experts that manually curate each of the relations that are included in the database, often combining text mining on published sources and additional tools for discovery, conflicting resolutions and integration. However, these tools, and thus the content of the database, typically neglect the information that is contained in the images that accompany the publications.
A series of publications has been made in this field, e.g.:
Document WO 201611 8513 A1 discloses a method and associated system for analyzing biological networks. The method includes obtaining data representing biological networks from one or more data stores and obtaining data representing biological pathways, such as pathways defined for the biological networks. The biological networks are defined by respective nodes representing molecules and connections representing relationships between or among the molecules.
Document US 20150186427 A1 discloses a method and a system for analyzing dynamic graphs. It is described that computations are performed at a plurality of graph vertices every time a change in the graph occurs. In order to minimize the computational load of each computational iteration, previous computation results are reused when the inputs for a computation at a given vertex are unchanged from previous computations.
However, typical cognitive computing systems are still sort of blind to graphics and documents including documents comprising molecular or pathway diagrams. Thus, there is a need for a better interpretation, categorization and/or classification of content contained in graphical representations of complex relationships of entities.