The present invention relates to text classifiers. In particular, the present invention relates to the classification of user queries.
In the past, search tools have been developed that classify user queries to identify one or more tasks or topics that the user is interested in. In some systems, this was done with simply key-word matching in which each key word was assigned to a particular topic. In other systems, more sophisticated classifiers have been used that use the entire query to make a determination of the most likely topic or task that the user may be interested in. Examples of such classifiers include support vector machines that provide a binary classification relative to each of a set of tasks. Thus, for each task, the support vector machine is able to decide whether the query belongs to the task or not.
Such sophisticated classifiers are trained using a set of queries that have been classified by a librarian. Based on the queries and the classification given by the librarian, the support vector machine generates a hyper-boundary between those queries that match to the task and those queries that do not match to the task. Later, when a query is applied to the support vector machine for a particular task, the distance between the query and the hyper-boundary determines the confidence level with which the support vector machine is able to identify the query as either belonging to the task or not belonging to the task.
Although the training data provided by the librarian is essential to initially training the support vector machine, such training data limits the performance of the support vector machine over time. In particular, training data that includes current-events queries becomes dated over time and results in unwanted topics or tasks being returned to the user. Although additional librarian-created training data can be added over time to keep the support vector machines current, such maintenance of the support vector machines is time consuming and expensive. As such, a system is needed for updating search classifiers that requires less human intervention, while still maintaining a high standard of precision and recall.