Many applications require partitions of a large graph/network into smaller communities. Qualitatively, a community is defined as a subset of vertices within the graph such that connections between the vertices are denser than connections with the rest of the network. The detection of the community structure in a network is generally intended as a procedure for mapping the network into a tree. In this tree (called a dendrogram in the social sciences, or a hierarchical tree in biology), the leaves are the vertices whereas the edges join vertices or communities (groups of vertices), thus identifying a hierarchical structure of communities nested within each other.
Partitioning graphs into communities and searching for subgraphs with high internal density within graphs/networks is of practical use in various fields: parallel computing, the Internet, biology, social systems, traffic management, etc.
For example, the following is an application in biology: Complex cellular processes are modular, that is they are accomplished by the concerted action of functional modules. These modules are made up of groups of genes or proteins involved in common elementary biological functions. One important and largely unsolved goal of functional genomics is the identification of functional modules from genomewide information, such as transcription profiles or protein interactions. To cope with the ever-increasing volume and complexity of protein interaction data, new automated approaches for pattern discovery in these densely connected interaction networks are required. Cluster analysis is an obvious choice of methodology for the extraction of functional modules from protein interaction networks. (See Detection of Functional Modules From Protein Interaction Networks, by Pereira-Leal, etc., Proteins, 2004; 54:49-57.)
A second example comes from the study of social networks: It is widely assumed that most social networks show “community structure”, i.e., groups of vertices that have a high density of edges within them, with a lower density of edges between groups. It is a matter of common experience that people divide into groups along lines of interest, occupation, age, etc. (See The structure and function of complex networks, By Newman, SIAM Review 45, 2003; 167-256)
Due to the fact of its importance in applications, many clustering methods/algorithms have been discovered and patented (such as U.S. Pat. No. 5,040,133 Feintuch, et al., U.S. Pat. No. 5,263,120 Bickel, U.S. Pat. No. 5,555,196 Asano, U.S. Pat. No. 5,703,959 Asano, et al., U.S. Pat. No. 5,745,749 Onodera, U.S. Pat. No. 5,832,182 Zhang, et al., U.S. Pat. No. 5,864,845 Voorhees, et al., U.S. Pat. No. 5,940,832 Hamada, et al., U.S. Pat. No. 6,003,029 Agrawal, et al., U.S. Pat. No. 6,038,557 Silverstein, U.S. Pat. No. 6,049,797 Guha, et al., U.S. Pat. No. 6,092,072 Guha, et al, U.S. Pat. No. 6,134,541 Castelli, et al., U.S. Pat. No. 6,195,659 Hyatt, U.S. Pat. No. 6,269,376 Dhillon, et al., U.S. Pat. No. 6,353,832 Acharya, et al., U.S. Pat. No. 6,381,605 Kothuri, et al, U.S. Pat. No. 6,397,166 Leung, et al., U.S. Pat. No. 6,466,946 Mishra, et al., U.S. Pat. No. 6,487,546 Witkowski, U.S. Pat. No. 6,505,205 Kothuri, et al, U.S. Pat. No. 6,584,456 Dom, et al., U.S. Pat. No. 6,640,227 Andreev, U.S. Pat. No. 6,643,629 Ramaswamy, et al., U.S. Pat. No. 6,684,177 Mishra, et al., U.S. Pat. No. 6,728,715 Astley, et al., U.S. Pat. No. 6,751,621 Calistri-Yeh, et al., U.S. Pat. No. 6,829,561 Keller, et al., etc.), and some algorithms have been embedded in various popular software (such as, BMDP, SAS, SPSS-X, CLUSTAN, MICRO-CLUSTER, ALLOC, IMSL, NT-, NTSYS-pc, etc.).
In general, almost all existing clustering methods can be classified as one of two types: agglomerative or divisive, depending on how the hierarchical trees are constructed and how vertices are grouped together into communities. (Examples of agglomerative clustering algorithm are found in U.S. Pat. No. 5,040,133 Feintuch, et al., U.S. Pat. No. 5,832,182 Zhang, et al., U.S. Pat. No. 6,049,797 Guha, et al., U.S. Pat. No. 6,092,072 Guha, et al, U.S. Pat. No. 6,134,541 Castelli, et al., U.S. Pat. No. 6,195,659 Hyatt, U.S. Pat. No. 6,397,166 Leung, et al., etc. Examples of divisive clustering algorithm are found in U.S. Pat. No. 6,038,557 Silverstein, U.S. Pat. No. 6,353,832 Acharya, et al., U.S. Pat. No. 6,381,605 Kothuri, et al, U.S. Pat. No. 6,466,946 Mishra, et al., U.S. Pat. No. 6,505,205 Kothuri, et al, U.S. Pat. No. 6,640,227 Andreev, U.S. Pat. No. 6,684,177 Mishra, et al.; etc.)