As the amount of media content, such as images, videos and sound files, proliferates on the Internet, users have begun to rely more heavily on Internet search engines to locate and view content in which they are interested. Web sites such as Yahoo and Google offer the capability to search for links to content on the Internet that is deemed relevant to a search query, such as web pages and multimedia, among other categories. In response to a query, the web site performing the search query may display content extracted from other web sites in addition to links to content.
One problem faced by web sites that offer multimedia search capability is the likelihood of displaying content or returning results linking to multimedia content that could be categorized as inappropriate or offensive. For example, a user searching for images or movies containing live concert footage for a particular celebrity may not desire to see nude pictures of the celebrity along with live concert photographs.
One approach to screening out content that may be inappropriate or offensive is to use automated tools to identify the content in an image or other multimedia content. One drawback to automated approaches is that the definition of what is inappropriate or offensive is very difficult to describe to a machine. For example, a cartoon containing an image of a religious figure may be perfectly acceptable to people who do not subscribe to that religion, but offensive to people who do. Traditional automated techniques for identifying offensive content, as described further below, would likely not be able to flag the cartoon as offensive with any degree of accuracy
One automated approach to identifying offensive images that may be returned in response to a search query is to detect the presence of large amounts of human skin in an image. The rationale behind this approach is that pornographic images and other types of sexually inappropriate images contain bare skin. A drawback to this approach is the large number of false positives; for example, a family photograph taken at a beach, where the family members are wearing bathing suits, would not qualify as pornographic, but may be identified as a pornographic image because of the high amount of skin content in the image. Also, this approach is very resource intensive with regard to processor usage and time.
Another automated approach to determining whether content returned in response to a search query is inappropriate is for any text associated with the content, such as captions, tags, or any text displayed in proximity to the content, to be scanned for particular words that may indicate inappropriate content. For example, an image with the words “nude” or “sexy” associated with the image may be flagged as inappropriate, and an image with inappropriate words associated with the image may likewise be flagged. A problem with this approach is that the text may not indeed by associated with the particular image, or an inappropriate image or video may have nonoffensive text associated with it in a deliberate attempt to thwart these approaches.
Another approach is for human editors to manually screen all content that may be returned in response to a search query and flag items that are inappropriate. This offers a higher degree of accuracy over the automated approaches, because humans are generally effective at identifying inappropriate content, but this approach is not efficient and relies on human biases that may not be true in all parts of the world where users are searching for content.
Therefore, an approach for identifying potentially inappropriate content that may be returned in response to an Internet search, which does not experience the disadvantages of the above approaches, is desirable. The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.