Some embodiments described herein relate generally to classification of natural language documents, and more particularly, apparatus and methods for classifying text content of natural language documents based on training sets.
Some known systems such as recommendation engines implement a standard approach to classify text content of natural language documents based on a training set associated with a term of interest. Such a training set used by these known systems is typically defined based on a specific single resource of the term of interest. Such known systems, however, typically do not provide a method to define the training set based on multiple resources of the term of interest, or automatically update the training set based on new information or resource(s) related to the term of interest. As a result, the training sets used in those known systems can be inaccurate or outdated in some scenarios.
Accordingly, a need exists for systems and methods for classifying text content of natural language documents based on training sets that can be defined based on multiple resources and automatically updated.