Many types of queries are somewhat ambiguous as to the user's intent with respect to what search results the user is seeking. For example, when a user submits a query, it is not apparent to a conventional search engine whether the user wants results corresponding to a local search or to a wider (e.g., global) search. There are numerous other examples, e.g., cooking/recipe-intended or not, in which one user may want to receive search results with links to cooking-related websites while another does not.
As a more particular example, online shopping is a popular way of doing business. Many times a user who is interested in purchasing a product (or service) enters something about that product into a search engine. For example, a user interested in purchasing a camera will type something about a camera when requesting a search, such as “digital camera reviews” or “digital camera price comparison.” However, not all users have commercial intent when requesting a search (e.g., “transfer pictures from a digital camera”). If it was possible to know whether or not a user had commercial intent when submitting a search, more relevant search results can be returned, which is both desirable to the user and lucrative to the search engine, shopping sites and manufacturer or service provider.
Algorithmically predicting a user's intent for a submitted query can be done to an extent, but this typically requires a large amount of high-quality training data to train a suitable classification (prediction) algorithm. Such training data needs to be labeled manually by judges as either intended or non-intended with respect to a classification class, based upon guidelines that define the meaning of intent.
As can be readily appreciated, manually creating such large scale datasets is extremely time-consuming, expensive, and error-prone. Notwithstanding, to be of value to a search engine, data labeling would need to be done often, because the labeled data may quickly become outdated. For example, in commerce, where new products are frequently introduced, a prediction algorithm would need to be regularly re-trained with new datasets. Labeling such new data in a timely manner was heretofore largely impractical and often not possible.