Networks provide a powerful framework for describing complex systems in many different areas, ranging from natural and social sciences to computer and electrical engineering. Their quantitative analysis is based on the concepts and properties studied in the mathematical field of graph theory. Leveraging this knowledge can help address challenging problems that arise in concrete situations.
Signed graphs are used in a variety of disciplines including systems biology, where a signed edge relating two nodes may represent the positive or negative regulatory relationship between two biological entities within a network. Recent advances in experimental and computational techniques have enabled systems-wide measurement of biological entities such as gene expressions or protein activities, and facilitated their integration into larger and larger networks. In this context, the derivation of systems-level properties integrating the levels of the individual biological entities with the corresponding graph structure has become of high interest, because it allows relating molecular mechanisms to overall system behavior.
The exposure of organisms to biologically active substances leads to complex responses, with the interplay between DNA, RNA, proteins, and other biological molecules coalescing to define the cellular phenotypes. Investigation of the resulting biological impact to yield coherent mechanistic insights requires methodologies that can leverage molecular profiling technologies that measure systems-wide changes in thousands of molecular species from a single experiment (e.g., transcriptomics, proteomics, or metabolomics).
A variety of approaches that partially address these investigational requirements have been developed. For example, to derive insight into individual mechanisms, transcriptomic data describing the differential gene expressions produced in response to an exposure can be interpreted in light of pre-defined sets of genes with similar functions or expression patterns (as defined by external databases, for example, MSigDB). Methods like Gene Set Enrichment Analysis (GSEA) or Reverse Causal Reasoning (RCR), which are based on the enrichment of these sets within the differentially expressed genes, enable qualitative investigation of experimental data in light of the statistical enrichment of mechanisms represented by each gene set, while other methods like Network Perturbation Amplitude (NPA) scoring provide quantitative assessment of the degree of perturbation of the mechanisms. It should be noted that the RCR and NPA approaches rely on gene sets that are causally downstream of each mechanism, and thus they allow identification (RCR) and quantitation (NPA) of mechanisms that are likely causes of the measured differential gene expression rather than their consequences (e.g., as assumed when activated pathways are identified based on the differential expressions of the transcripts corresponding to their constituting proteins). To gain systems-level mechanistic insights, findings for active molecular mechanisms can be linked to potential systems-level and phenotypic effects using biological networks comprised of relationships between molecules and processes. Such biological networks are available in variety of public and commercial databases (e.g. Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathways). However, formal methods to integrate individual mechanistic findings and network-level relationships are required to assess the global biological impact of an active substance in a mechanistically coherent manner. They can be guided by the NPA approach that contains a first step in this direction, because it combines the individual mechanisms interconnected within a biological network into a single aggregated entity for which the degree of perturbation can be evaluated.
The ability to gain quantitative systems-level mechanistic insight into the effects of exposure to biologically active substances or other environmental insults (together referred to as “exposures”) on biological networks using molecular profiling data has a variety of practical applications, from drug development to consumer safety. For example, candidate compounds can be screened for their ability to affect signaling in therapeutically-relevant pathways (e.g., inhibition of cell cycle), or the molecular mechanisms modulated by chemical exposure can be quantitatively evaluated for their possible association with health risk (e.g., induction of DNA damage). Both of these examples highlight the pressing need to assess the biological impact of exposure, whether the ultimate goal is therapeutic intervention or harm reduction. Recently, there has been an increased focus in systems toxicology on systems-oriented methodologies that emphasize understanding the biological impact of chemical exposures with increased mechanistic granularity. In particular, a recent report by the US National Research Council Committee on Toxicity Testing and Assessment of Environmental Agents advocates for a shift away from toxicological assessment at the level of apical endpoints and towards deriving systems-level understanding of the effects of an exposure on the affected toxicity pathways. In this context, approaches that can leverage network-level information together with quantitative assessments of mechanistic effects provide a powerful opportunity to offer true systems-level insights into the biological impact of an exposure.
Although the biological processes mentioned above are highly interconnected, the underlying biological mechanisms can be organized into distinct network models with defined boundaries in order to explicitly capture the cellular signaling pathways in each process. This segmentation enables the independent evaluation of each process that contributes to a distinct function within the cell. The signaling events within a network can be captured as causal relationships representing signed and directed cause-effect relationships (edges) between biological entities, processes, or even other networks (nodes). Because proteins and interactions are often involved in regulating multiple responses, nodes and edges can be shared among multiple networks, providing an explicit representation of the interactions between subnetworks.
Individual nodes within a network may represent entities or activities that can be experimentally measured, and together these measurements can provide insight into the overall function of the network. In addition to individually looking at measurements for individual nodes in the network, it can be advantageous to summarize these measurements into an overall “score” representing the net activation of the network. Furthermore, while it may not be possible to easily measure some of the nodes in a network, it may be possible to obtain a score for some of these nodes that have an associated signature of measurements. A score for one node can similarly combined with scores or measurements for other nodes in the network to provide an overall score for the activation of the network.