Social media platforms are often the most up-to-date platforms for discussing real-world events and current affairs. The content published on social media platforms is frequently updated with new posts, often presenting new content before such content can be published via more traditional channels such as television and newspaper. For example, a social media website may present a published web page with core content and include additional frequently published content in blog posts, comment posts, reply posts, content sharing posts, posted links, and other types of posts. In one specific example, a user at a professional sporting event may post a comment on the result of the event as soon as the event ends, and users both at the event and users elsewhere may quickly respond to the post with reply posts providing supplemental information and commentary. The event triggers a time-delimited topic on social media. The topic is discussed through a number of related stories.
Existing systems track content posted on social media platforms but fail to adequately identify and organize such content for particular topics. Tracking topics is a difficult problem, because it requires tracing the emergence of a topic and its evolution over time, and some related topics may not even have been present at the outset. Furthermore, this would involve rigorous filtering of the social media data with the relevant seed words, and then segregating data relevant to it, over a time period. Existing approaches solve the topic identification problem by generally identifying words and co-occurrences of words, or using clustering techniques to find groups of similar content. That may include, for example, parsing posts and computing how often words occur together in the posts. Posts with frequently co-occurring words are matched to a topic, whereas remaining posts are filtered out. This matching includes matching the frequently co-occurring words to words of a topic from a list of potential topics. When the volume of the matched posts exceeds a certain threshold, the topic is found to be trending. However, because of the lack of a moderator or a filter in such systems, the identified social media data may include old or stale posts and thus irrelevant posts are often identified as related to a topic and stale topics are often erroneously identified as trending. Because topic determinations are based simply on posting volume, existing systems cannot identify how long a topic has persisted, how a topic has changed or evolved over time, why a topic is trending, or detects stories (e.g., subtopics) within a topic and the trend of the stories (e.g., start, end, and relevance).