1. Field of the Invention
This invention relates to systems and methods for classification of large amounts of documents and other data.
2. Background of the Invention
Many attempts have been made to automatically classify documents or otherwise identify the subject matter of a document. In particular, search engines seek to identify documents that are relevant to the terms of a search query based on determinations of the subject matter of the identified documents. Another area in which classification of documents is important is in the area of product-related documents such as product descriptions, product reviews, or other product-related content. The number of products available for sale constantly increases and the number of documents relating to a particular product is further augmented by social media posts relating to products and other content.
Although some automatic classification methods are quite accurate, they are not a substitute for human judgment. Often documents identified or classified using automated methods are completely irrelevant. In addition, these methods are subject to manipulation by “spammers” who manipulate the word usage of content to obtain a desired classification but provide no useful content.
Of course, with such a large volume of content, human classification of documents is not practical. The systems and methods described herein provide improved methods for incorporating both automated classification and human judgment in a highly effective manner.