Social media are computer-based tools, such as websites and applications, which enable content to be created and shared amongst an audience via the internet. The types of content available through typical social media outlets can vary, but most commonly are either text-based (e.g. status updates and commentary) or image-based (e.g. videos and photographs) in nature.
A microblogging service is one form of social media that has grown considerably in prominence. For instance, the TWITTER social networking service is a well-known microblogging service that enables users to send and read relatively short text-based messages. Currently, microblogging participants include both passive users who mainly follow high volume content generators, such as celebrities and news organizations, and active users who use social media to, inter alia, engage in discussions and rally support for causes.
The growth in popularity of microblogging services has resulted in expanded uses of its content. In particular, the rapid expansion of microblogging services is proficiently used in the commercial world to support targeted advertising (i.e. advertising to designated audiences). An example of targeted advertising achieved using content found through social media streams is described in U.S. Pat. No. 8,429,011 to C. D. Newton et al., the disclosure of which is incorporated herein by reference.
To optimize the use of content from social media in the commercial world, data analytics are commonly employed to parse text streams and identify one or more terms of interest. Through such use of data analytics, internet memes are commonly discovered.
An internet meme, or meme, is defined generally as a transient concept, topic or event (e.g. a catchphrase or activity) captured in an electronic medium that is shared rapidly amongst an audience via the internet. Often referred to as “viral” media, memes are largely discovered en masse through conventional social media streams and have a lifespan on the order of several hours to a week.
The application of effective analytics on social media content to discover relevant memes is essential in the early detection of emerging patterns and novel content. However, the ability to effectively discover memes is rendered difficult due to not only the rapid increase in the number of prominent social media sources but also the commensurate rise in the number of regularly active microbloggers (with certain microblogging services exceeding 200 million active monthly users) who generate a continuous stream of posts on a broad range of topics. As a result, the search for relevant content amidst the noise inherent in such a prohibitively large volume of largely irrelevant data has been found to be highly challenging.
For text-based social media streams, the discovery of memes is often achieved using basic word detection search algorithms. For instance, microblogging anomalies (i.e. the unusually excessive usage of a set of one or more terms) are often detected using tools provided by the social media source which simply count the frequency that a particular set of terms appears within the data stream within defined period of time. If the term exceeds a particular threshold or is comparatively large, the trending term set is identified as a meme.
Traditional techniques for discovering text-based memes through the detection of semantically matched posts have been found to suffer from a few notable drawbacks.
As a first drawback, traditional meme discovery techniques are not always effective in identifying new, relevant and notable memes. Specifically, it has been found that certain trending terms are largely recurrent and, as a consequence, may be less relevant than certain new, previously unidentified trending terms. For instance, lunch-related memes occur at midday on a daily basis and, as such, are not typically of particular relevance. At the same time, certain less prevalent, yet potentially novel and notable, anomalies may be occurring but are rendered difficult to identify due to the presence of these commonly occurring microbursts. As a consequence, trending memes of notable significance may be effectively hidden by larger-scale, commonly reoccurring memes of lesser significance. This often results in an unacceptable delay in identifying trending memes of particular significance (i.e. after the meme has already achieved viral status) rather than identifying such memes at an early stage (e.g. after just a few tweets).
As a second drawback, traditional meme discovery techniques which rely upon the identification of semantically matched posts are ineffective in locating all posts that relate to a common concept. In other words, two related posts that utilize distinct yet synonymous terms (e.g. eat and dine) are not commonly categorized using traditional meme discovery techniques. So, although such tools can help a user identify semantically matched posts, the results give only a limited indication of what ideas or concepts are currently being discussed and shared. As a result, effective identification of all posts relating to a particular meme is not readily obtainable.
As a third drawback, traditional meme discovery techniques rely upon basic algorithmic constructs which tend to execute in a slow and inefficient manner. As can be appreciated, it is generally desirable to identify memes as early as possible for a wide variety of reasons, such as targeted marketing or other commercial purposes. Consequently, the relatively slow speed associated with traditional meme discovery techniques often necessitates that inspection of a relatively large data feed be limited to a small subsection thereof.
As a fourth drawback, traditional meme discovery techniques are ineffective in determining, evaluating and reconfiguring the duration of the anomaly detection period to be utilized. In other words, if too short a period of time is utilized to evaluate the presence of anomalies, slower forming memes (i.e. memes with less of a burst) will be difficult to identify. By contrast, if too large a period of time is utilized to determine the presence of anomalies, the timeliness of the meme discovery process can be significantly compromised.