1. Field of the Invention
This invention relates to systems and methods for classification of large amounts of documents and other data.
2. Background of the Invention
Many attempts have been made to automatically classify documents or otherwise identify the subject matter of a document. In particular, search engines seek to identify documents that a relevant to the terms of a search query based on determinations of the subject matter of the identified documents. Another area in which classification of documents is of importance is in the realm of social media content. Millions of users generate millions of documents in the form of social media posts every day. In order to make use of this information, the documents must often be classified or otherwise sorted. As for search engines, “spam” postings that are automatically generated or that otherwise contain irrelevant content should be removed.
Although some automatic classification methods are quite accurate they are not a substitute for human judgment. Often documents identified or classified using automated methods are completely irrelevant. In addition, these methods are subject to manipulation by “spammers” that manipulate the word usage of content to obtain a desired classification but provide no useful content.
Of course, with such a large volume of content, human classification of documents is not practical. The systems and methods described herein provide improved methods for incorporating both automated classification and human judgment.