According to the Blue research study 82% of users indicate that they would consider trying a new product if someone in their social network recommended it. However, this is complicated by the volume of data users generate; for example, users upload 100 hours of video to YouTube™ every 60 seconds and share more than 4.75 billion pieces of content on Facebook™ every 24 hours.
Determining the propagation of influence in online networks plays a significant role in numerous applications, ranging from collaborative filtering and viral marketing to anomaly detection. The accurate detection of influential nodes in large real-world data streams requires the efficient computation of the similarity between high numbers of pairs of objects, which can clearly be a limiting factor in computing.
Kempe et al., (D. Kempe, J. M. Kleinberg, E. Tardos. Maximizing the spread of inuence through a social network. KDD 2003: 137-146), presented a mathematical formalization of the problem of viral marketing in social networks, with the independent cascade model which assumes that a user influences each of her neighbors in the network with a certain probability. The authors presented approximation algorithms for influence maximization in the network, i.e., find a small number of users to target in the advertisement campaign in order to maximize the spread of influence. A user u influences a neighbor of hers V with a certain propagation probability puv. It is assumed that propagation probabilities puv are known in advance. In a recent work, Goyal et al., (A. Goyal, F. Bonchi, L. V. S. Lakshmanan. Learning influence probabilities in social networks. WSDM 2010: 241-250), have addressed the problem of learning the influence probabilities from data for different similarity measures.
Current stream summarization approaches such as AMS (Alon Matias Szegedy) sketching ignore time constraints or they only consider binary data. Therefore being able to analyze and derive meaningful relations from the large data and in a timely/scalable manner, within reasonable computing limits, remains a problem today.