The specification relates to detecting overlapping communities in weighted graphs.
Community detection is a well-studied paradigm and is used in a variety of domains, such as physics, biology, computer science, social networks analysis, etc. to find novel and useful collection of entities that go together. One way to form communities is by amassing entities such as tags or keywords in collaborative tagging systems, products in retail systems, people in social networking systems, or genes in biological systems.
There are two main problems in community detection. First, the data used to associate entities with other entities might be noisy as in some cases these associations might just be random while in other cases these might be significant. This degree of association between pairs of entities might be quantified by a weight between them. Therefore detecting communities in weighted graphs is a more important problem than in unweighted graphs. The second problem is that the communities themselves might be overlapping because an entity might be associated with more than one community. For example, a product like a wrist watch might go with both electronic products as well as jewelry products. Similarly, an ambiguous keyword like “bank” might mean a financial institution or a bank of a river. Therefore the community detection paradigm must deal with overlapping communities in weighted graphs.
Previous attempts to create graphs include generating an unweighted graph either directly or by thresholding a weighted graph, detecting communities, and then removing the noise from the communities. Thresholding, however, leads to a significant loss of information and makes the final communities detected very sensitive to the threshold used to convert the weighted graph into unweighted graph. In addition, removing the noise from the communities after they have been detected results in poorly defined communities because important tags may get improperly removed.