The Internet search engine has become an important source of revenue for the service providers that operate them. The revenue is primarily generated from the display of advertisements to search engine users. The more Internet traffic that a search engine receives, the more attractive it is to advertisers and the more revenue it can generate. It is generally regarded that the best way search engines can increase traffic is to provide highly relevant search results. But what is relevant today may not be relevant tomorrow or even relevant later the same day. It is difficult for service providers to keep pace with the rapid changes in searchable content based on seasonal and popular trends and topical events in the news.
One way that search engine operators strive to maintain the relevance of the results that their search engines generate is to use a relevance schema. The relevance schema represents the algorithm the search engine uses to generate a set of search results, usually in a particular order of relevance. The relevance schema is continually reevaluated using human judges to determine whether the results produced using the schema are valid, i.e., whether the results are still relevant. The search engine operator makes changes to the schema from time to time, as indicated by the human judges.
The problem with the above approach to maintaining the relevance of the search engine is that it is time-consuming, slow, and subjective. The human judges can only evaluate just so many possible search results, and their judgments of what is or is not relevant may not reflect a typical user's judgment. Other approaches suffer from similar drawbacks. For example, some users may respond to surveys conducted by the search engine operator, giving direct feedback on the relevance of a particular set of search results. But the amount of data collected in this manner may be of insufficient volume to be considered reliable, and simply does not have the breadth and scale to truly reflect what users want when conducting their searches.
Another approach that is becoming more prevalent is the use of click-through data collected for the search results. The search engine operator collects the user's interaction with the search results by recording the number of times users click on a result, referred to as the “click-through rate” or CTR. The click-through data has a number of advantages in that data can be collected in large volume as users interact with search results and is therefore a more objective measure of user satisfaction and more reliable predictor of relevance. In general, experience has shown that the higher the CTR, the more relevant the result, or at least the greater the satisfaction of the user with the result. But the CTR data must still be analyzed and the operator must then decide how to update the relevance schema to generate better results. Moreover, the CTR data alone may be insufficient to produce a meaningful result. For example, the CTR of a particular result may be influenced by a number of factors related to the appearance of the result on the page that cause the CTR to be unduly inflated out of proportion to the actual relevance of the underlying result.
No matter what the approach, determining the relevance of search results is a difficult task, in large part because there is no single definitive indicator of success of a search result. The sheer scale of the number of queries handled by a search engine and the speed with which the search results are generated make relevance a fast-moving target.