Brands are carefully crafted and incorporate a company's image. Unfortunately, in the current online environment, advertising networks juxtapose advertisements that represent such brands with undesirable content due to the opacity of the ad-placement process and possibly due to a misalignment of incentives in the ad-serving ecosystem. Even isolated, outlier cases can cause significant damage, for example, to the public image of a company that inadvertently supports a website containing such undesirable content through advertising.
Online advertisers typically use tools that provide information about websites or publishers and the viewers of such websites to facilitate more effective planning and management of online advertising by advertisers. Moreover, online advertisers continually desire increased control over the web pages on which their advertisements and brand messages appear. For example, particular online advertisers may want to control the risk that their advertisements and brand messages appear on pages or sites that contain objectionable content (e.g., pornography or adult content, hate speech, bombs, guns, ammunition, alcohol, offensive language, tobacco, spyware, malicious code, illegal drugs, music downloading, particular types of entertainment, illegality, obscenity, etc.). In another example, advertisers for adult-oriented products, such as alcohol and tobacco, may want to avoid pages directed towards children. In yet another example, particular online advertisers may want to increase the probability that their content appears on specific sorts of sites (e.g., websites containing news-related information, websites containing entertainment-related information, etc.). However, many advertising tools merely categorize websites into categories indicating that a web site contains a certain sort of content.
Other approaches use models, such as predictive models or classification models, to determine whether a website contains or has a tendency to contain questionable content. In a particular example, features relating to the use of classification models for calculating content quality ratings for web pages, domains, and sitelets are described in commonly-owned, commonly-assigned U.S. patent application Ser. No. 12/859,763, filed Aug. 19, 2010 and U.S. patent application Ser. No. 13/151,146, filed Jun. 1, 2011, which are hereby incorporated by reference herein in their entireties. Accordingly, rating systems can be used to generate classification models to monitor and detect the presence of questionable content on, for example, web pages.
However, assessing and evaluating the performance of these models is particularly challenging. In one example, the efficacy of these rating systems and, more particularly, its one or more predictive models can depend, however, on the quality of the data used to train them. The size of training data sets can make manual removal or labeling of pages difficult and/or impractical. As a result, models trained on such data may, for example, miss some cases (e.g., not classify objectionable content on pages), misclassify some cases (e.g., deem content as objectionable when the content is benign or deem benign when the content is objectionable), or have other errors. In a more particular example, referring to the rating system mentioned above, the model used by the rating system may misclassify a hate speech page as being suitable for advertising. In another particular example, the model used by the rating system can overlook rare, but nevertheless important, classes or subclasses of cases, such as hatred towards data miners.
There is therefore a need in the art for approaches for identifying errors in predictive models using annotators. Accordingly, it is desirable to provide methods, systems, and media that overcome these and other deficiencies of the prior art.