1. Field of the Invention
The invention is related to a method for analyzing, querying, and mining graph databases using subgraph and similarity querying of graph databases.
2. Description of the Related Art
(Note: This application references a number of different publications as indicated throughout the specification by one or more reference numbers within brackets, e.g., [x]. A list of these different publications ordered according to these reference numbers can be found below in Section G below entitled “References.” Each of these publications is incorporated by reference herein.)
Recent technological and scientific advances have resulted in an abundance of data that describe and model phenomena in terms of graphs: primitive components and relationships between them. Querying and mining of these graphs has the potential to advance our understanding in numerous ways: understanding of new connectivity patterns and evolutionary changes, discovery of topological features, and synthesis of new models.
Graph data models have been studied in the database community for semantic data modeling, hypertext, geographic information systems, XML, multimedia, and schema matching and integration. For example, schema of heterogeneous web-based data sources and e-commerce sites can be modeled as graphs and the problem of schema matching and integration can be solved through graph matching. In a recent study, video data scenes were modeled as graphs of primitive objects and similarity queries were answered through graph isomorphism.
More broadly, graphs have been used for modeling biological pathways, chemical compounds, protein structures, social networks, and taxonomies. For example, a metabolic pathway is modeled as a set of reactions, enzymes, and metabolites, and an edge is placed between a reaction and a metabolite (or enzyme) if it participates in the reaction. Similarly, the 3D structure of proteins can be modeled as contact maps: atoms whose distance is less than a threshold have an edge between them.
One important class of graph data is the genome-level protein interaction networks constructed from integration of high-throughput sources (microarrays, RNAi, bioimages) with genomics and literature data. These networks are inherently probabilistic.
Understanding such protein interactions is an essential aspect of biology. A systems level understanding of the signaling pathways and networks in a disease model allows one to evaluate the consequences of modulating activity, expression levels, or post-translational modification of a potential drug discovery target.
Understanding protein interactions within a pathway and interaction between pathways permits selection of a target that, when modulated, addresses disease condition with minimum impact on other physiological processes. Systems level information about protein-protein interactions provides novel opportunities for drug discovery by expanding knowledge of protein function, while generating a large new class of potential targets.
Since protein-protein interactions play a fundamental role in all biological and pathological processes, they represent attractive targets for therapeutic intervention. One can select small molecules that specifically disrupt particular protein-protein interactions. Alternatively, critical evaluation of protein interactions within a network can lead to selection of a specific protein target (e.g. enzyme, receptor) readily amenable to intervention by a drug-like small molecule.
What is needed then is a technique for analyzing, querying, and mining graph data, especially in genome-level protein interaction networks. The present invention satisfies this need.