A stream of user-generated content (UGC) is a common occurrence on Internet websites. For example, on a website hosting an online publication, such a stream might take the form of comments on an online article. Or such a stream might take the form of a news feed based on a social graph on a website hosting a social network or social media.
For both legal and business reasons, hosting websites monitor such streams for abusive content. Such abusive content might take the form of spam, fraudulent or illegal offers, offensive language, threatening language, or treasonous language, if the UGC is text or audio. Similarly, such abusive content might take the form of pornography or violent imagery, if the UGC is an image or video.
Alternatively, websites might monitor such streams for interesting (e.g., buzzworthy) content and relocate such content in the stream so as to engage users. For example, Facebook uses an algorithm called EdgeRank to construct a News Feed that is personalized in terms of interestingness, among other things, for each user profile and/or user history. In this regard, also see the “interestingness” algorithm described in co-owned U.S. Published Patent Application No. 2006/0242139, entitled “Interestingness Ranking of Media Objects”.
Monitoring a stream for abusive UGC is difficult because the posters of such content are adversarial and learn how to avoid hard-and-fast rules. In the area of predictive analytics and machine learning, this problem falls under the category of concept drift, e.g., changes over time in the concept being modeled by a classification system. It will be appreciated that interesting content is almost inherently subject to concept drift.
Online active learning addresses the problem of concept drift, e.g., by adjusting the predictive model (or classifier) according to new UGC with the aid of human labelers. However, human labelers are expensive both in terms of time and money. So research is ongoing on efforts to lessen the involvement of human editors in predictive models that perform online active learning.