Selection of drug candidates for clinical development is a particularly difficult problem because there is generally a poor understanding of the biochemical pathways that determine the drug mechanisms of efficacy and toxicity. These biochemical pathways include, among other things, a series of biomolecules that may be suitable targets for drug development. For example, biomolecules such as kinases play a role in normal homeostasis and disease progression, often becoming deregulated through genetic alterations that result in their aberrant activities and/or changes in their overall expression. Even though kinases are easy targets for drug development, very few kinases inhibiting drugs are being developed. This is because the known mechanisms of action of these few drugs were based on the existence of decades of research and knowledge that is difficult to replicate in a short period of time.
High-throughput measurements of mRNA, protein and metabolite levels in conjunction with traditional dose-dependent efficacy and toxicity assays, has emerged as a means for elucidating drug or compound mechanism of action. Scientists have attempted to combine information from these measurements with knowledge about pathways from literature to assemble relevant biochemical pathways. Researchers then use numerical and statistical techniques such as clustering and statistical mining to distill through large quantities of data to understand and describe mechanisms of action.
Most of these approaches typically calculate covariances between the measurements (e.g., gene expression levels) and thereby reveal underlying correlations. However, such correlations are not helpful in making formal predictions that can be tested experimentally. For example, it may be possible for a gene to have a high expression level when another gene also has a high expression level. However, the genes may not be part of the same biochemical pathway and may be simply correlated with one another, while not being causally connected to each other. It would then be impossible to predict a change in gene expression in one gene based on the level of expression of the other. Furthermore, the published literature has only a small percentage of the molecular circuitry mapped out and can therefore only provide limited assistance to the researcher. Moreover, current techniques are not equipped to handle simultaneously different types of data including gene expression, proteomic, metabalomic, and other phenotypic data.
Researchers have begun applying a number of computational approaches to overcome some of the drawbacks noted earlier. These computational approaches attempt to reverse engineer gene and protein networks from molecular profiling data. However, because of the mathematical complexity of managing and resolving networks from such large data sets, these techniques are focused on networks with very few components.
Accordingly, there is a need for systems and methods for identifying and constructing models of compound mechanisms of action and extracting information from such models for selecting drugs for development. Generally, there is a need for systems and methods for inferring network models from large quantities of differing types of data and extracting information from such models.