(1) Field of Invention
The present invention relates to a system for discovering important elements that drive an online discussion of a topic and, more particularly, to a system for discovering important elements that drive an online discussion of a topic using network analysis.
(2) Description of Related Art
The large scale of microblogging activity has given rise to free-form discussions in which participants may join and leave at any time. Measuring the public interest in a topic through its online discussions in microblogs can be difficult due to the sheer scale of microblog data (e.g., 100 million new posts per day to Twitter™), as well as the variety of language used in the discussion. Furthermore, the frequency with which entities appear in a discussion may not correlate with their importance. For example, consider a spammer who repeatedly posts the same message about a topic to a discussion. The elements of that message will have a high frequency; however, their relative importance (i.e., whether other discussion participants build on those messages), may be very low due their status as spam. Furthermore, within a single discussion many subtopics may emerge, with varying degrees of frequency. Detecting and ranking the important elements of these sub-discussions can be difficult using a frequency-based analysis since a less frequent element (e.g., a news story) might be discussed by a small core group, whereas another news story may just be mentioned more times by unrelated individuals.
Several works have leveraged the structure of microblog discussions to discover important features individually, such as hyperlinks (see the List of Incorporated Cited Literature References, Literature Reference No. 6), hashtags (see Literature Reference No. 4), events (see Literature Reference No. 3), or Tweets™ themselves (see Literature Reference No. 2). For example, Romero, Meeder, and Kleinberg (see Literature Reference No. 4) analyzed the growth and persistence of hashtags in different topic categories, demonstrating that the emergence of a popular hashtag is highly topic dependent. Their method could be used to measure important hashtags. However, it does not take into account users, uniform resource locators (URLs), Tweets™, or locations.
In a separate work, De Choudhury, Counts, and Czerwinski (see Literature Reference No. 2) considered a problem related to what is most important for a search result given a search query for a term. Using microtext, social network, and discussion attributes that were selected based on a user survey, they found that Tweets™ which exemplified diversity in these attributes were among the best to return according to a user assessment. Their work could be considered an alternate approach to identifying the most important Tweets™ in a discussion. However, their method does not take multiple entities into account.
Ruiz et al. (see Literature Reference No. 5) proposed a similar method for constructing networks from Twitter™ messages about a publically traded company and demonstrated how the properties of this network could be used to predict stock price changes. Their network representation does not consider the location of the users, which is essential to identifying geographically-local discussions. Furthermore, their method did not consider normalizing the diameter of the graph and, therefore, was not effective in using the diameter for the purposes of their paper.
The representation of discussions as interconnected networks of entities has not been considered in the prior art. Existing processes for discovering important elements have largely relied on frequency-based analysis or on identifying important users and analyzing their content. Thus, a continuing need exists for a method that relies upon the interconnectedness of all of a discussion's entities as a way of discovering which elements are important to a discussion.