1. Field of the Invention
This invention relates to systems and methods for characterizing the quality or interestingness of shared documents and other data.
2. Background of the Invention
Many attempts have been made to automatically classify documents or otherwise identify the subject matter of a document. In particular, search engines seek to identify documents that a relevant to the terms of a search query based on determinations of the subject matter of the identified documents. Another area in which classification of documents is of importance is in the realm of social media content. Millions of users generate millions of documents in the form of social media posts every day. In order to make use of this information, the documents must often be classified or otherwise sorted. As for search engines, “spam” postings that are automatically generated or that otherwise contain irrelevant content should be removed.
Although some automatic spam detection methods are quite accurate they are not a substitute for human judgment. Often documents identified as important using automated methods are completely irrelevant. In addition, these methods are subject to manipulation by “spammers” that manipulate the word usage of content to obtain a desired classification but provide no useful content.
Of course, with such a large volume of content, human evaluation of documents is not practical. The systems and methods described herein provide improved methods for incorporating both automated evaluation of document quality and human judgment.