The present invention generally relates to social network analysis, and more specifically, to predicting influence in social networks.
Over the past decade, the Internet has created new channels and enormous opportunities for companies to reach customers, advertise products, and transact business. In this well established business model, companies fully control their own web-based reputation via the content appearing on their websites. The advent of Web 2.0, with its emphasis on information sharing and user collaboration, is fundamentally altering this landscape. Increasingly, the focal point of discussion of all aspects of a company's product portfolio is moving from individual company websites to collaborative sites, blogs, and forums—collectively known as Social Media. In this new media essentially anyone can post comments and opinions about companies and their products, which may influence the perceptions and purchase behavior of a large number of potential buyers. This is of obvious concern to marketing organizations—not only is the spread of negative information difficult to control, but it can be very difficult to even detect it in the large space of blogs, forums, and social networking sites.
The extent to which any reputation can be impacted by a negative story depends heavily on where the story first appears. Negative sentiment posted on an influential blog is clearly more damaging than if it appears on an inconsequential blog. Conversely, marketing people may wish to inject a positive view into the blogosphere, and hence they need to know who are the most influential bloggers relevant to a specific topic. For this reason, rigorous measures of influence and authority are essential to social media marketing.
Micro-blogs like Twitter have raised the stakes even further relative to conventional blogs. Literally within minutes, a story or opinion can spread to millions of individuals. Clearly, the speed with which such a story propagates depends on the degree of influence carried by the nodes that immediately adopt the story.
Identifying the most important or prominent actors in a network has been an area of much interest in Social Network Analysis dating back to Moreno's work in the 1930's [J. Moreno, Who shall Survive? Foundations of Sociometry, Group Psychotherapy and Sociodrama. Washington D.C.: Nervous and Mental Disease Publishing Co., 1934.]. This interest has spurred the formulation of many graph-based socio-metrics for ranking actors in complex physical, biological and social networks. These sociometrics are usually based on intuitive notions such as access and control over resources, or brokerage of information [D. Knoke and R. Burt, Applied Network Analysis. Newbury Park, Calif.: Sage, 1983, ch. Prominence.]; and has yielded measures such as Degree Centrality, Closeness Centrality and Betweeness Centrality [S. Wasserman and K. Faust, Social Network Analysis: Methods & Applications. Cambridge, UK: Cambridge University Press, 1994.].
In the exploratory analysis of networks, the question of whether these measures of centrality really capture what we mean by “importance” is often not directly addressed. However, when such sociometrics start being used to drive decisions in more quantitative fields, there emerges a need to empirically answer this question. Probably the most popular of these measures in the Data Mining community is PageRank, which is a variant of Eigenvector Centrality [L. Katz, “A new status index derived from sociomertric analysis,” Psychometika, vol. 18, pp. 39-43, 1953.]. Once its use in Information Retrieval (IR) and Web search in particular became popular, it led to more rigorous evaluation of PageRank and variants on measurable IR tasks [M. Richardson, A. Prakash, and E. Brill, “Beyond pagerank: machine learning for static ranking,” in WWW, 2006.], [T. H. Haveliwala, “Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search,” IEEE Trans. on Knowl, and Data Eng., vol. 15, no. 4, pp. 784-796, 2003].
With the rise of Web 2.0, with its focus on user-generated content and social networks, various socio-metrics are being increasingly used to produce ranked lists of “top” bloggers, twitterers, etc. Do these rankings really identify “influential” authors, and if so, which ranking is better? With the increased demand for Social Media Analytics, with its focus on deriving marketing insight from the analysis of blogs and other social media, there is a growing need to address this question.