The material in the following section is merely provided for general background information and is not intended for use as an aid in determining the scope of the claimed subject matter.
Search engines are now commonplace in many software applications, both server-based and client side. For example, search engines may be used for searching for text strings in applications such as word processors, for searching for help in sophisticated software as varied as spreadsheets and operating systems, or for searching for uniform resource locator (URL) references and other web-based documents. Since sets of documents can be extremely large, and since any one search engine may have access to multiple document sets, the sheer volume of relevant documents retrieved by a search can be very large.
A list of documents returned in response to a user query should preferably be sorted by relevance in the context of the corresponding search terms. The effectiveness of any one search may be abstractly judged by whether the top few returned documents include the document(s) actually sought by the user. This organization of search results makes it easier for a user to select the documents that he or she believes have the greatest relevance to the search.
Recently, search engines have been augmented with classifiers that support a retrieval of documents with high relevance. Such classifiers are commonly implemented based on training data reflective of any of a variety of different types of user feedback. For example, some classifiers account for factors such as, but not limited to, click patterns (i.e., “click-throughs), explicit user satisfaction ratings (i.e., “explicit feedback”), previous user search history, search context and/or search entry points (i.e., where the search started). The leveraging of these and/or other types of feedback into the searching process enables some improvement as to the determination of which documents are likely to be most relevant for a particular user query. Search classifiers are sometimes evaluated with “test sets” that are typically collected from click-through data and/or explicit user feedback distinct from the data used for training.
The effectiveness of a classifier is generally contingent upon the quality and quantity of underlying training data. It is common for a system to have access to multiple sets of training data, often times from different sources. Some sets of data may even have different characteristics or qualities as compared to another. It becomes a challenge to create a classifier that blends training data in a way that will support accurate and effective searching.
Reliance on classifier models to augment search performance is particularly effective for improving server-side search relevance, where trends can be dynamically monitored and accounted for based on numerous searches received from many searching sources. However, not all search environments enjoy the same situational advantages. Client-side searches, for example, have traditionally been conducted based on a set of keywords that are associated with each document. It is not uncommon for an individual, such as an author, to manually associate a document with relevant keywords to be used subsequently for identification. Thus, the client-side searching process often involves matching search terms with keywords. Under these conditions, the identification of a relevant document is contingent upon a nexus between the perspective of the individual(s) that selected keywords and that of the user selecting search terms. It is not uncommon for a relevant document to be missed because there is not a meeting of the minds in terms of how the individual(s) and user perceive a particular class of subject matter.