1. Field of the Invention
The present invention generally relates to an information processing technology, and particularly to a method, program, and system for calculating similarity between nodes in a graph.
2. Description of the Related Art
A graph is a mathematical object made up of vertices (sometimes called nodes), edges (sometimes called branches, or links) joining some or all of the vertices. The nodes and edges in the graph can be labeled to be identified.
An actual object such as a road map or a chemical formula can be represented by a graph. For example, in the road map, an intersection and a road can be considered to be a node and an edge, respectively. In the chemical formula, an atom and coupling between atoms can be considered to be a node and an edge, respectively. It is understood that a graph is applicable to a wide range, including genes, protein structures, electric circuits, geography, and architecture.
In recent years, a graph structure begins to be applied to a representation of social networking service (SNS). Specifically, a specific state of SNS can be represented by a graph by assuming individual SNS users to be nodes and friendship between the users to be edges. Similarly, it is to be understood by those skilled in the art that the link structure of the World Wide Web (WWW) can be represented by a graph.
When an actual object is represented by a graph, it is often necessary to evaluate similarity between nodes in the graph. Some examples include evaluating whether a known medicine reacting to proteins in a living organism is similar to a medicine under development and to predict whether the new medicine reacts to the proteins in the living organism. In this case, the evaluation will be achieved by calculating similarity between a node associated with the known medicine and a node associated with the new medicine in a graph representing the proteins and the medicines.
As background art, a method is disclosed for displaying and extracting highly similar regions on the basis of a sequence alignment result selected as local similar sequences in biopolymers made of constituent sequences and a device thereof. The method includes calculating an alignment result of sequences having the local similar sequences by using a dynamic programming approach or the like. The alignment result is obtained as a graph, with a first axis as the element number denoting the order of a base or amino acid that is an element of one of aligned sequences and a second axis as the cumulative value of scores up to the element number, and the graph is displayed. In highly similar regions, the slope of the graph is steep. (See Japanese Patent Application Publication No. H07-155169, “JP7155169”.)
As another background art, a communication software development support device is disclosed for efficiently developing highly reliable communication software. The device includes: a unit for interpreting the specification of a function described and added in processes of communication software development and generating a first graph representing a state transition relation composed of a state and a signal generated at the time of performing communication processing; a unit for interpreting the specification of an automatically retrieved function having already been developed and generating a second graph representing a state transition relation composed of a state and a signal generated at the time of performing communication processing; a unit for calculating similarity between the first graph and the second graph; a unit for selecting the graphs in descending order of the calculated similarity; and a unit for displaying the specifications of the developed functions corresponding to the selected graphs, where reusable functions having already been developed are presented to a person defining specification in descending order of reusability. (See Japanese Patent Application Publication No. H07-219759, “JP7219759”.)
As still another background art, a similarity calculation device is disclosed for enabling similarity between texts to be easily calculated with the structures of the texts reflected. The similarity calculation device includes: a morphological analysis section for performing morphological analysis on the text; a clause analysis section for composing a clause; a dependency analysis section for deciding dependency associated with the clause; a non-circular directed graph generation section for generating a non-circular directed graph with the hierarchy corresponding to the text to be processed being permitted on the basis of a result of the morphological analysis, a result of the composition of the clause, and a result of the analysis of the dependency; and a similarity calculation section for calculating similarity between the non-circular directed graphs and outputting the calculated similarity as similarity between texts. The similarity between the non-circular directed graphs is determined as the total sum of the number of matched partial paths in all the partial paths in the non-circular directed graph. Preferably, the similarity is calculated by a recursive formula. (See Japanese Patent Application Publication No. 2004-272352, “JP2004272352A”.)
As further another background art, a word class creation program is disclosed that is capable of creating a group of the same class words according to a target document very reliably without preparing a word thesaurus for a set of terms appearing in the target document. In the program, similarity is calculated with respect to a combination of extracted terms; a pair of terms composed of two terms are sorted on the basis of the similarity; a graph is created by expressing a combination relation with an edge with each term as a node with respect to the selected pair of terms; candidates for a dichotomy pattern of the graph obtained by severing a predetermined edge are extracted; a graph is divided on the basis of an average edge density calculated from the respective candidates; and terms in a configuration node of each of a plurality of graphs in a division result are extracted as a term group of the same class. (See Japanese Patent Application Publication No. 2007-128389, “JP2007128389”.)
As further still another background art, a graph integration device is disclosed having a simple structure, capable of treating a plurality of integration graphs with less computational complexity. The device is configured to receive inputs of a plurality of input graphs G each including nodes representing input elements and an edge representing a branch and combination between the nodes and to integrate the input graphs G. The device includes: a graph input unit; an input graph storage unit; a similarity calculation unit for calculating similarity between input graphs G by DP matching; a similarity determination unit for determining whether the input graphs G are similar to each other on the basis of the similarity; a graph integration unit for integrating the input graphs G if the input graphs G are similar to each other; a graph addition unit for adding each of the input graphs G as a new integration graph unless the input graphs G are similar to each other; and an integration graph storage unit. (See Japanese Patent Application Publication No. 2010-032919, “JP2010032919”.)
As still another background art, literature entitled “Fast subtree kernels on graphs” discloses as follows: “We propose fast subtree kernels on graphs. On graphs with n nodes and m edges and maximum degree d, these kernels comparing subtrees of height h can be computed in O(mh), whereas the classic subtree kernel by Ramon & G″artner scales as O(n24dh). Key to this efficiency is the observation that the Weisfeiler-Lehman test of isomorphism from graph theory elegantly computes a subtree kernel as a byproduct. Our fast subtree kernels can deal with labeled graphs, scale up easily to large graphs and outperform state-of-the-art graph kernels on several classification benchmark datasets in terms of accuracy and runtime.” (See Nino Shervashidze and Karsten M. Borgwardt, “Fast subtree kernels on graphs”, NIPS 2009.)
As further another background art, literature entitled “Graph Matching: Theoretical Foundations, Algorithms, and Applications” discloses as follows: “Graphs are a powerful and versatile tool useful in various subfields of science and engineering. In many applications, for example, in pattern recognition and computer vision, it is required to measure the similarity of objects. When graphs are used for the representation of structured objects, then the problem of measuring object similarity turns into the problem of computing the similarity of graphs, which is also known as graph matching. In this paper, similarity measures on graphs and related algorithms will be reviewed. Applications of graph matching will be demonstrated giving examples from the fields of pattern recognition and computer vision. Also recent theoretical work showing various relations between different similarity measures will be discussed.” (See Horst Bunke, “Graph Matching: Theoretical Foundations, Algorithms, and Applications”, Montreal, Quebec, Canada, May 2000, pp. 82-88.)