(1) Field of Invention
The present invention relates to a system for gauging public interest in a topic and, more particularly, to a system for gauging public interest in a topic using network analysis of online discussions.
(2) Description of Related Art
The large scale of microblogging activity has given rise to free-form discussions in which participants may join and leave at any time. Furthermore, for a given topic many factors, such as news stories or region-specific interest, may drive new users to participate in a discussion. Measuring public interest in a topic through its online discussions in microblogs can be difficult due to the sheer scale of microblog data (e.g., an estimated 100 million new posts per day to Twitter™), as well as the variety of language used in the discussion. Furthermore, the frequency with which topical messages are posted is not necessarily an accurate gauge of interest due to the non-uniformity of users where some users will post significantly more messages and over-represent a frequency-based sample.
Crane and Sornette propose a statistical model of online viewing that identifies three classes of user behavior for predicting viewing trends in YouTube™, as described in “Robust Dynamic Classes Revealed by Measuring the Response Function of a Social System” in Proceedings of the National Academy of Sciences 105(41):15649-15653 (hereby incorporated by reference as though fully set forth herein). Their model only uses viewing counts in measuring interest. However, when applied to online discussions instead of YouTube™ videos, this method would ignore the topical content in the discussion as well as the relationships between the participants and their locations, all of which reveal the degree of topical focus the discussion has around a core set of entities.
Romero, Meeder, and Kleinberg analyzed the growth and persistence of hashtags in different topic categories, demonstrating that the emergence of a popular hashtag is highly topic dependent in “Differences in the Mechanics of Information Diffusion Across Topics: Idioms, Political Hashtags, and Complex Contagion on Twitter in Proceedings of the 20th International Conference on World Wide Web, 695-704, ACM. 2011 (hereby incorporated by reference as though fully set forth herein). This method could be used to measure interest by modeling the spreading of a hashtag throughout a discussion's content. However, it does not take users or locations into account, nor does it consider the relatedness between multiple hashtags within a single discussion.
Ruiz et al. proposed a similar method for constructing networks from Twitter™ message about a publically traded company and demonstrated how the properties of this network could be used to predict stock price changes, as described in “Correlating Financial Time Series with Micro-Blogging Activity in WSDM, 2012 (hereby incorporated by reference as though fully set forth herein). Their network representation does not consider the location of the users, which is important in identifying geographically-local discussions.
The representation of discussions as interconnected networks of entities has not been previously considered. Existing processes for measuring importance have largely relied on frequency-based analysis or on identifying important users and analyzing their content. Thus, a continuing need exists for a method that relies upon the interrelatedness of all of a discussion's entities as a way of assessing how focused the discussion is on a key set of topics in order to characterize collective focus.