Finding communities is a fundamental problem in social network analysis. Various definitions have been proposed for communities in a social network. One type of community in any social network is where every member of the group knows every other member of the group. If one thinks of a person in the social network as a vertex and a relation between two persons (knows) as an edge, then a social network can be viewed as a graph. Social networks where the relation is bidirectional (that is, if a knows b, then b also knows a) and between any pair of people there can be at most one relation would result in a simple undirected graph describing the social network. One can think of the communities mentioned above as cliques in the graph which describes the social network. Usually, one would be interested only in maximal cliques.
One of the natural social networks is formed by calling patterns of people. It is very interesting for the telecom service providers to understand the communities in the social network formed by their users. This understanding can help improve the effectiveness of campaigns, help identify potential churners by looking at the communities that the past churners were a part of, identify potential conversion targets by finding people not in their network but part of communities in their network, and to understand the spreading of value added-services based on the social network structure.
Telecom service providers have call detail records (CDRs) for all the calls originating from or terminating in their network. The telecom service provides can analyze these CDRs to identify the social network as a graph by treating people as vertices and calls between people as edges. When the social network is derived from a telecom network, the cliques can be useful for identifying targets for campaigns. However, in such situations, the information content of small maximal cliques is low and the number of small maximal cliques in a social network graph is very large. Therefore, the relative gain of finding such small maximal cliques is very low. As such, it is of interest to find only the large maximal cliques in the social networks.
Existing approaches to maximal clique enumeration include attempts to provide a run-time that is guaranteed to be polynomial in output size, as well as trading this guarantee for fast running time for practical problems. The output polynomial approaches are based on augmentation of a clique in such a manner that one would avoid going in the direction of a non-maximal clique. Therefore, it is hard to know if a clique under construction is likely to have the required minimum size until very late in the process. Because of this nature, it is difficult to exploit the constraint that the maximal clique needs to be large very effectively. Some other existing approaches are based on starting with a large size vertex set and then pruning it to a maximal clique in an iterative or recursive fashion. Here, one has an estimation of the largest maximal clique that one can find from such a set (the size of the set itself) and hence can exploit the size constraint readily.
As noted above, finding large maximal cliques in a very large graph is a problem of fundamental importance. Existing approaches can include, for example, finding a maximum clique in the graph, as well as attempting to enumerate all maximal cliques.
An approach to find a maximum clique in a graph is not suitable for all situation where one needs to find all large maximal cliques (for example, in telecom, the closed user groups are cliques, and one would need to find all large closed user groups (CUG), not just one of them.
Also, in very large scale integration (VLSI), cliques represent fully interconnected components). Also, an approach attempting to enumerate all maximal cliques is usually very slow and finds all maximal cliques, some of them are too small to be of much use. Also, such an approach is usually limited in the size of graph that can be handled as well as being slow.
In telecom, if people are considered vertices and calls and/or short message service (SMS) between them are considered edges, then the problem of finding CUGs can be considered a clique detection problem. Existing approaches can include, for example, attempting to find a maximum clique by calculating a connectivity count for each vertex and removing the vertices systematically to obtain, for example, a maximum clique. However, such approaches do not enumerate all maximal cliques.
Also, an existing approach can includes clustering multi-dimensional related data in a computer database by combining the two vertices of a graph connected by an edge having the highest score. However, such an approach disadvantageously uses weight assigned to an edge that is not based on the extent of overlap in the neighborhoods of the vertices in question.
Clustering coefficients (that is, the ratio of number of triangles in the immediate neighborhood of the node divided by the number of possible triangles based on the degree of the vertex) are disadvantageous, as described below, for the purpose of clique enumeration. They cannot be used to filter the graph meaningfully while not affecting any interesting maximal cliques. Consider, for example, one is looking for maximal cliques of size 10 and above. Now, assume that a node has degree 100 and participates in a clique of size 10, but all of its other 90 neighbors are not connected to each other. In such situations, the clustering coefficient of the node will be low, but it participates in the maximal clique of desired size. On the other hand, there may be nodes which do not participate in the desired clique size but have a high clustering coefficient. Hence, one cannot use the clustering coefficients meaningfully to reduce the graph size.
Cliques can sometimes be too restrictive a definition for communities in social networks. Clique relaxations have been proposed in existing approaches such as, for example, quasi-cliques, k-cores, k-cliques, k-club and k-plex. Quasi cliques can be defined as a group of vertices such that the ratio of the number of edges in the sub-graph to the number of edges in a clique of the same size is above a user supplied parameter.
A group of vertices can be called a k-core if the degree of each node in the group is at least k in the sub-graph induced by this group of vertices. Although finding the densest sub-graph is a difficult problem, an approximation within a factor of two is more easily found. Finding k-cores is also straightforward. However, neither of these definitions provides any guarantee about the cohesiveness and tightness of the communities (in terms of diameter of the graph) and, hence, is not favored for community definition in social networks.
In an existing approach, for example, k-cliques is defined as a group of vertices such that there exists a shortest path between any pair of vertices in the group which is no more than k in length in the full graph. This indicates that some of the nodes that may be in the shortest path between a pair of vertices from the group may not belong to the same group. When the diameter of the induced sub-graph is k (that is, when there exists a shortest path between any pair of nodes of length ≦k in the sub-graph induced by the vertex set) then it is called as k-club.
As such, all k-clubs are k-cliques, but all k-cliques are not k-clubs. While k-club definition guarantees a small diameter of the community, it is still not cohesive, as deletion of a few nodes may change the community characteristics dramatically. Another clique relaxation includes k-plex, where each member of the community is connected to all others in the community except k members. This definition guarantees a small diameter, etc., but finding such communities is difficult.
Finding social networks on Internet based online groups has been a topic of research interest. One existing approach explores online repositories of real-life social networking data to find different characteristics of real-life social networks. Existing approaches also include investigating formation, membership, growth and evolution of large social networks.