The present invention relates to training a machine learning component of a user interface. More particularly, the present invention relates to obtaining training data by mapping queries to tasks.
A natural user interface accepts natural language queries, and in response returns a list of results which are most likely to correspond to the intended query subject matter. The results typically include tasks, documents, files, emails, or other items (all collectively referred to herein as tasks) which hopefully answer the query. A promising technology for producing the results for a query is machine learning technology. Machine learning algorithms use statistical data to predict a desired result for a particular query. Using machine learning algorithms, the statistical data can be constantly or frequently updated after initial training based upon the results of further queries from a user.
Before any machine learning algorithm can be provided for use with a natural user interface with the expectation that it will provide meaningful results, the algorithm must be “trained” with accurate annotated data. In other words, the algorithm requires training data indicative of statistics from a large list of query-to-task mappings. When the natural user interface and corresponding machine learning algorithm is to be deployed to a customer, it is even more essential that the machine learning algorithm be trained with accurate annotated data prior to its deployment. For example, with one type of output of the machine learning algorithm being a list of tasks, such as “install a printer” or “printer trouble shooting”, the machine learning algorithm requires data representative of examples of natural language queries for which these tasks would be the desired result.
In order to increase the accuracy of the machine learning algorithm of the natural user interface, the training data must be representative of a very large list of examples of query-to-task mappings. Conventionally, the large number of query-to-task mappings has been produced by obtaining a query log containing a very large number of actual queries submitted to a search engine. For example, the query log would typically include on the order of 10,000 queries or more. The user or author would then go through those queries one-by-one and manually annotate them (associate them with a particular task).
A common method of annotating the queries to tasks is to represent each of the queries in a first column of a spread sheet database, and to represent their corresponding intended task in the same row of a second column of the spread sheet. This process is therefore very labor intensive and time consuming. Further, given a very large list of potential tasks to choose from, selecting which task to annotate with a particular query becomes even more cumbersome.
Therefore, a system or method which can be used to facilitate faster and more accurate query-to-task mapping to obtain training data would be an significant improvement in the art.