Whilst there are numerous stand alone software or database packages suitable biomedical research, there are none that integrate phenotype and genotype analysis into a single system with two-way directionality and functionality. There is no way for which medical informatics can feed necessary information into a bioinformatics (molecular-based) system that utilizes clinical data together with molecular information to establish meaningful clinical/preclinical applications of molecular-based medicine. Because of specific needs of clinicians, animal care technicians, molecular-based scientific researchers, pharmaceutical researchers, etc., no system exists to integrate these overlapping needs into a single system with multiple applications. The vision is that the respective features required by different disciplines will have to be integrated at some point, particularly as molecular-based research moves ever closer to clinical application. Moreover, through the combination of medical informatics and molecular informatics, the discovery of new diagnostics and treatments is greatly accelerated.
Currently, researchers are forced to study a small subset of “clinical phenotypes” based upon their bias, and ask whether molecular data can be fitted to their phenotypes of interest. Conversely, clinicians are devoid of access to simple, meaningful (clinically applicable) but novel molecular diagnostics/treatment strategies for the future. Within the pharmaceutical industry, vast amounts of data are accumulated throughout the drug discovery process within different sectors (such as target discovery, drug discovery, preclinical and clinical groups) using a variety of experimental platforms. This requires an integrated solution that comprises database and analysis tools that allow for multiple users in different disciplines to utilize a common informatics solution such that data can be exchanged and shared along the drug discovery pipeline in a meaningful fashion.
There have been no public attempts to develop an equivalent fully integrated solution. As mentioned above, there are many individual “modules” for various sub-problems, but little in the way of conjoining these modules, and nothing on utilizing this functionality. Typically, databases have been designed that may track subjects and samples and associated data, but do not allow for analysis of the data within the same structure. Typically, data files must be exported to secondary software (such as GeneSpring, Spotfire) that performs various statistical analyses and generates “molecular results.” As such, there are no means by which the molecular results can then be exported back into the database, to extract the subject/sample/experimental parameters that may explain the molecular results (i.e., a so-called “hypothesis”).
Moreover, while systems biology tools and approaches are gaining wide acceptance among molecular biologists and clinical researchers, two fundamental issues have emerged. The first one is how to use sets of available high-throughput molecular data to reconstruct biological networks that are truly relevant to the condition of interest. The second, even more important issue is how to utilize results of such reconstruction in the framework of standard laboratory practices and in clinical applications. In a typical pathway analysis set-up the first step is association of experimentally identified genes and/or proteins with available pathway and protein interaction data. When reconstructing condition-specific networks it is often assumed that groups of proteins responsible for performing certain biological functions should be closely located in terms or “network distance.” Thus different variations of the “shortest path” algorithm often serve to extract such modules. The algorithms are usually accessible either as built-in network reconstruction tools within commercial software packages or as open-source plug-in modules for Cytoscape. However, one fundamental issue facing this approach is the fact that biological networks are highly interconnected due to the presence of a small number of hubs—network nodes with hundreds or even thousands of connections. Thus, almost under any circumstances, the shortest path between two nodes would be the one via such hub(s). Even though this may, in some cases represent biologically meaningful pathways, many network modules constructed in this way would actually be artifacts. Thus, further analysis of network topology and graph statistics are needed to find pathways that are truly significant for a given molecular profile.
Few attempts have already been made to address this issue where others have proposed to weigh nodes in metabolic networks based on their connectivity were a penalty is assessed to highly connected metabolites. These types of results show significant improvement in the accuracy of predicting known metabolic pathways. Another approach, uses well-established canonical pathways as “shortcuts” while generating shortest paths in protein signaling networks. These types of algorithms give preference to known signaling routes while reconstructing condition-specific networks. In another recent research, the emphasis has shifted from high degree hubs to nodes that are “bottlenecks” in the network—those that have disproportional number of shortest paths going through them. While these improvements indeed lead to selection of many biologically meaningful pathways—they do not consider network topology in the context of a particular molecular profile. For example, penalizing hubs might exclude them in situations where they play a truly important role in a condition-specific network. By the same token, always giving preference to known pathways limits the ability to generate new hypothesis about important signaling cascades.
A second and even more important issue is how to utilize results of the systems-level analysis in guiding further laboratory research and clinical applications. The results of pathway analysis are usually sets of fairly complex networks or sets of functional processes that are deemed to be relevant to the condition represented by the molecular profile. While this information is certainly useful, due to the nature and limitations of work in the research or clinical laboratory one still needs to make the transition back to the level of verifiable hypotheses about roles of individual genes and proteins. Thus, the problem of guiding further research often requires identifying a relatively small number of molecules that can be further interrogated in the laboratory with clear-cut outcomes allowing either to confirm or refute a hypothesis.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of embodiments of the present invention.