A graph is a mathematical subject composed of apexes (which are also called nodes), and sides (which are also called edges, branches or links) connecting the apexes. The apexes have labels that are used for the apexes to be differentiated from one another. In considering such subjects in realistic cases, it can be found that, for example, a road map, a chemical formula and the like are expressed as graphs.
For example, in a road map, intersections can be assumed as nodes, and roads can be assumed as edges. In a chemical formula, elements can be assumed as nodes, and bonds between elements can be assumed as edges. In this context, it can be found that graphs are applicable to a very wide range of fields such as genes, protein structures, electric circuits, geology and architectonics.
Recently, a graph structure has started to be applied even to a social networking service (SNS). That is, a specific state of an SNS can be expressed in a graph with the assumptions that individual users of the SNS are nodes, and that relationships or the like between those users and between others are edges. In the same sense, a link structure of the World Wide Web (WWW) can be also expressed in a graph.
When realistic subjects are thus expressed in the form of graphs, it is naturally desired that whether or not two graphs coincide with or resemble each other be evaluated. For example, if a graph of a chemical formula of some medicine is evaluated as resembling a graph of a chemical formula of another medicine, it is possible to estimate that these two medicines have similar medicinal effects.
According to past studies, however, a polynomial time algorithm has not been known with respect to a problem of determining whether or not two graphs are the same, and an algorithm used for determining whether or not some graph is contained in another graph is also NP complete.
Such algorithms can give solutions in reasonable computation times for graphs only having relatively low numbers of nodes. However, the numbers of nodes are as large as several thousands to several tens of thousands in a case of bioinformatics dealing with gene sequences, and as large as several millions in an SNS, and therefore far exceed the extent that can be handled by a realistic calculation amount of a naive similarity calculation technique.
To solve this, there have been heretofore proposed techniques used for calculating sameness or similarity between two graphs at high speed.
“General Graph Identification With Hashing” by Thomas E. Portegys, School of Information Technology, Illinois State University (http://www.itk.ilstu.edu/faculty/portegys/research/graph/graph-hash.pdf) discloses a technique of determining sameness of two graphs at high speed by using a technique called MD5 hashing. This technique, however, allows only determination on sameness of graphs, and cannot be applied to calculations of a degree of similarity therebetween.
Particularly with respect to producing hash values associated with a relevant graph, Japanese Unexamined Patent Application Publication No. Hei 7-334366 discloses that, while a hash table operable to store hash values of all of partial graphs of a graph S is retained, combinations of partial graphs having existed in the past and partial graphs reached through reduction of the foregoing partial graphs are stored. However, hash values are given by use of a recursive approach in this technique, and therefore, this technique can be applied to a directed acyclic graph but not to a more general graph including a loop.
U.S. Pat. No. 6,473,881 discloses a technique of causing a transistor-level design automation tool to carry out pattern matching for a circuit design through timing analysis, checking of electrical rules, noise analysis and the like. However, this technique uses characteristics such as a keynode that is particular to a circuit, and thus it is difficult to expand its use to general graph comparison.