Contextual classification of objects enables content providers to determine that objects share similar properties and group those objects with one another to meet various needs or objectives. For example, content providers may seek to group like objects together, so that content consumers can easily access similar content. As another example, objects may be contextually classified to facilitate filtering of content, such as by classifying and removing inappropriate content.
Contextual classification may be performed on a wide variety of objects, such as web pages or finite objects like text, images, videos, and other content. In the context of web pages, contextual classification may be performed to group web pages by topic, thus improving the organization of a web site. By placing similar content together in one location, content providers make it easier for users to find content that suits their interests, thus increasing the likelihood that users will view multiple web pages within a web site upon finding a single web page on that web site that suits their interests or needs.
Prior methods for contextual classification have relied primarily on manual input by editors or users. For example, some prior techniques have relied on manual tagging by persons designated as content reviewers. Such content reviewers may be employees of a particular company, such as the editors of a particular web site, or they may also be users of a web site. Regardless of the criteria for selection of persons to review and classify content, there are several disadvantages associated with the manual classification of objects. For example, manual tagging of objects is time-consuming. For content providers with numerous objects to classify, manual classification may take so long as to decrease the utility of the classification. By way of example, an online news provider that relies on manual classification of news stories will need stories classified quickly in order to guide users to the stories in which they are interested before other news sources capture the users' attention (or before the news becomes stale). If a content provider releases a large volume of new content per day, it may be unfeasible to have each object tagged within an acceptable time frame.
Contextual classification may involve classification of an object based on a single parameter or on numerous parameters. For example, a web page may be classified based on genre (e.g., politics, religion, sports), audience (e.g., children, adults), mood (e.g., funny, depressing, inciting), and numerous other parameters. Further, whereas some parameters may be assigned a binary value (e.g., true/false, yes/no), other parameters may be assigned a wide variety of values and require more analysis to determine the appropriate value.
In order manually to classify a large volume of content, a large number of people must be involved, which leads to the second drawback of manual tagging—it can be very expensive. A content provider that produces numerous web pages or articles must dedicate a large number of editors to classifying content if the content is to be classified accurately and in a timely fashion. Further, depending on the nature of the content produced, content providers may require that such editors have certain educational qualifications or prior work experience. As the number and skill level of editors increases, the cost of manual contextual classification increases as well.
The present disclosure is directed to addressing one or more of the above-referenced challenges or drawbacks with conventional methods and techniques for contextual classification. The present disclosure provides improved systems and methods for performing contextual classification using supervised and unsupervised training. Among other features and advantages, certain embodiments of the present disclosure may utilize parallel machine learning to determine the best model/parameter combination for contextual classification of objects based on supervised and unsupervised training. Exemplary implementations of the disclosed embodiments include article classification and comment moderation.