Many modern applications handle objects that can be represented as graphs. Transportation applications need to manipulate road networks, CAD/CAM applications require the organization of electrical or electronic components, pattern recognition and computer vision applications require the classification of an unknown object, chemistry and molecular biology applications require the manipulation of molecules. In the aforementioned applications, as well as in much more, the objects are structural in nature and therefore can be considered as graphs. For example, a graph G (V, E) is composed of a set of nodes V and a set of edges E, with each edge connecting two nodes. FIG. 1(b) illustrates a simple example of network graph found in prior art. In many areas, there are multiple objects involved and the relations between objects may be quite complex, where the objects are represented by large and complex network graphs. In order to better understand several aspects of the invention, some frequently used technical terms in this field are introduced as follows.
In prior art, a number of graph classes have been identified, including simple graphs, pseudo-graphs (with loops), multi-graphs (two or more edges connecting a pair of nodes), directed graphs (the edges have an orientation), weighted graphs (there is a weight associated with each edge). The similarity between graphs is measured in terms of the distance between graphs. The closer two graphs are, the more similar they are. If the distance between two graphs is 0, they can be considered as identical. There are usually two manners of measuring the distance between graphs:                Feature-based Distance: a set of features is extracted from the structural representation, and these features are used as n-d vector where the Euclidean distance can be applied.        Cost-based Distance: the distance between two objects measures the number of modification (edition) required in order to transform the first object to the second.        
In the prior art, there are a number of methods being proposed to calculate the similarity between graphs using one of the above manners. For example, Structure-Based Similarity Search with Graph Histograms, 10th International Workshop on Database & Expert Systems Applications, pp. 174-178, Sep. 1-3, 1999, by Apostolos N. Papadopoulos, Yannis Manolopoulos, proposes to calculate similarity based on a cost function. Moreover, please refer to, Rascal: Calculation of Graph Similarity Using Maximum Common Edge Subgraphs, The Computer Journal, vol. 45, no. 6, pp. 631-644, 2002, by J. Raymond, E. Gardiner, and P. Willett; A Distance Measure between Attributed Relational Graphs for Pattern Recognition, IEEE Transactions on Systems, Man and Cybernetics, vol. 13, pp. 353-362, 1983, by A. sanfeliu K.-S.Fu. The contents of the above papers are incorporated entirely herein by reference.
One frequently asked question in network graph applications is how to detect a community structure in a huge and complex network graph. The community structures are subsets of nodes within which node-node connections are dense, but between which connections are less dense. FIG. 2 shows an exemplary network graph, found in prior art, having a number of communities. For the sake of simplicity and clarity, the network of FIG. 2 is relatively simple, with only 3 communities. The connections within each community are relatively dense, while the connections between communities are relatively loose. The heterogeneity of connections suggests that the network has certain natural divisions within it. Community structures are quite common in real networks. Social networks often include community groups based on common location, interests, occupation, etc. Metabolic networks have communities based on functional groupings. Being able to identify these sub-structures within a network can provide insight into how network function and topology affect each other.
Finding communities within an arbitrary network can be a difficult task. The number of communities, if any, within the network is typically unknown and the communities are often of unequal size and/or density. Despite these difficulties, however, several methods for community finding have been developed and employed. One of the oldest algorithms for dividing networks into parts is the minimum-cut method (and variants such as ratio cut and normalized cut). This method sees use, for example, in load balancing for parallel computing in order to minimize communication between processor nodes. In the minimum-cut method, the network is divided into a predetermined number of parts, usually of approximately the same size, chosen such that the number of edges between groups is minimized. The method works well in many of the applications for which it was originally intended but is less than ideal for finding community structure in general networks since it will find communities regardless of whether they are implicit in the structure, and it will find only a fixed number of them. In addition, one of the most widely used methods for community detection is modularity maximization. Modularity is a benefit function that measures the quality of a particular division of a network into communities. The modularity maximization method detects communities by searching over possible divisions of a network for one or more that have particularly high modularity. Since exhaustive search over all possible divisions is usually intractable, practical algorithms are based on approximate optimization methods such as greedy algorithms, simulated annealing, or spectral optimization, with different approaches offering different balances between speed and accuracy.
However, the calculation complexity required for applying the above methods to a huge and complex network graph is tremendous, which is often of O(n3) complexity. Therefore, it is difficult to find similar sub-graphs (e.g., community structures) from a huge network.