Widespread Internet use has led to the massive proliferation of documents on the World Wide Web. As many of these documents contain adult content unsuitable for minors, Web search engines must effectively identify and classify adult Web documents when responding to Internet search queries.
Current techniques for classifying adult content include analyzing features of a Web document in isolation, e.g., determining the presence of adult-oriented text embedded in the document. However, such text-based techniques are often inadequate when classifying documents containing sparse text, such as image or video websites. On the other hand, applying image- or video-based techniques to such websites, e.g., skin-color pixel analysis, may require significant computational resources to implement.
Accordingly, it would be desirable to provide novel and efficient techniques for accurately classifying adult Internet content.