Online social networking and microblogging services, such as Twitter, enable their users to send and receive short messages, which include text, audios, images and videos, to a recipient or a group of recipients. In one instance, in the case of Twitter, the short messages are text-based messages of up to 140 characters, which are commonly known as “tweets.” A user can type a tweet via a mobile device or a computer and send the tweet to a Twitter server, which relays the tweet to a list of other users (known as followers) who have signed up to receive the sender's tweets either by text message to their mobile phones or by instant message. In another form of microblogging service, Tencent offers WeChat, which provides multimedia communication messaging including text messaging, hold-to-talk voice messaging, broadcast (one-to-many) messaging, photo/video sharing, location sharing, and contact information exchange.
Twitter storms are phenomena where social media analytic entities can be flooded with frequently recurring items. Some of the messages may be viewed as “noise,” where the same message is tweeted or re-tweeted at a much higher frequency than usual, or the messages can act as a “signal” that people genuinely want to express their feelings about some topic, which has a real sentiment-magnitude. While first discovered in Twitter, statistically this type of phenomenon can occur in many social media settings, such as Google+ streams, Facebook, Sina Weibo, and Tencent WeChat.
One key practical issue that arises from Twitter storms is the flood of input data stream into a search engine, which may have the effect of drowning out useful information or creating problems with how to weight the storm. Even if a Twitter storm is detected, other key issues remain as to what to with the storm data. If the Twitter storm appears to be noise, should the tweets be removed? For a Twitter storm, there is a question as how to weigh the signal from the storm in the final analysis of sentiment-magnitude. As an example, if a re-tweeted event is identical but deemed a signal and re-tweeted 10,000 times, the tweet may be treated as “1” event since the information is the same, or the tweet could be given a weighted output (e.g., 10,000×) because the re-tweeting is relevant in terms of magnitude.
In a conventional solution, a Google Scholar search on “Twitter spam detection” returns over 300 academic publications since 2008, whereas a search on Twitter storm returns around 30 different publications. Many of the prior publications returned as a result to the “Twitter storm” query are primarily social studies that study the after-effect or the dissemination of the storm (Chadwick 2011, Rubbison 2012). Kim et al., 2012, studies the Twitter storm problem as the classification problem. The study classifies the storm into two categories: (a) opinion-bearing storms and (2) fact storms. The findings are that, with fact storms (which are along the line commonly referred to as news) the tweets barely contain the title of the news and a link to the news, whereas opinion-bearing Twitter storms are identified by the sentiment along with the repeating phrases related to the storm. This publication used Linguistic Inquiry and Word Count (LIWC) tool to identify sentiment in tweets. The group also studied the dissemination of Twitter storm. However, their opinion does not identify Twitter storms, but rather they did an after-study of a known event to analyze the characteristics of a storm, which attempted to answer three questions: (1) What are the temporal and spatial diffusion characteristics in the spread of corporate bad news? (2) How does the network structure determine the reactions of socially connected users? (3) What kinds of negative and positive sentiments are portrayed in Twitter conversations?
Social media are subject to “storms” of data in which news, certain social media phenomena such as jokes or spam, appear suddenly. This can present challenges for text mining data applications such as brand and product sentiment measurement where the burst phenomena overwhelm the legitimate expression of product sentiment. Accordingly, it is desirable to have a system and method that automatically detect and classify the microblogging burst phenomena.