The rapid growth of online social, communication and academic networks has led to the creation of massive graphs containing a large number of edges, an edge being an interaction or connection (link) between two parties or events (nodes). For example, modern social networks and academic networks may contain millions to hundreds of millions of nodes and links. The links in these networks are often created through noisy processes. In such cases, not all links may be equally informative for the knowledge discovery process. Such links may often be harmful for making inferences in real network analysis scenarios.
The presence of noisy links is very common in a variety of real network analysis scenarios. For example, the vast majority of ties in social and information networks are weak ties, which do not add much information to the network representation. In a social network such as Facebook, the majority of friends on a social network may be relatively inactive links corresponding to distant acquaintances. Such links may not add much to the knowledge discovery process. As another example, many links in academic networks are caused by occasional interactions between unrelated researchers. In many cases, these occasional interactions do not represent true affinities or linkages between these researchers. In a further example, in many biological networks such as protein interaction networks, the links are inferred statistically. This is an inexact and noisy process.
Noisy links can be an impediment to many applications such as data mining or graph mining. For example, most of the methods for graph mining, such as community detection and classification are highly dependent on consistency in link structure in order to obtain accurate results. However, existing methods are still prone to a significant amount of noise, which is caused by the inconsistent links in the network.