The Internet has vast amounts of information distributed over a multitude of computers, hence providing users with large amounts of information on various topics. This is also true for a number of other communication networks, such as intranets and extranets. Although large amounts of information may be available on a network, finding the desired information can be difficult.
Search engines have been developed to address the problem of finding desired information on a network. Typically, a user who has an idea of the type of information desired enters one or more search terms to a search engine. The search engine then returns a list of network locations (e.g., uniform resource locators (URLs)) that the search engine has determined to include an electronic document relating to the user-specified search terms. Many search engines also provide a relevance ranking. A typical relevance ranking is a relative estimate of the likelihood that an electronic document at a given network location is related to the user-specified search terms in comparison to other electronic documents. For example, a conventional search engine may provide a relevance ranking based on the number of times a particular search term appears in an electronic document, its placement in the electronic document (e.g., a term appearing in the title is often deemed more important than the term appearing at the end of the electronic document). In addition, link analysis has also become a powerful technique in ranking web pages and other hyperlinked documents. Anchor-text analysis, web page structure analysis, the use of a key term listing, and the URL text are other techniques used to provide a relevance ranking.
Many search engines employ various ranking algorithms to produce such a relevance ranking reflecting the relative importance of the different electronic documents resulting from a search query. The ability of current ranking algorithms to produce an accurate relevance ranking depends on numerous, tunable dimensions or other parameters (e.g., 200 or more). Thus, a technique to identify an optimal ranking algorithm that has optimal dimensions or parameters for producing an accurate relevance ranking is desired.
In some existing systems and methods, human intuition is used to tune the parameters of a given ranking algorithm in an attempt to produce a relevance ranking that is correlated to a human-judged ranking of electronic documents. However, human intuition fails to identify an optimal ranking algorithm with optimal parameters efficiently, reliably, and effectively.
In some fields of computer science, “best fit” or “minima/maxima seeking” algorithms are used to solve various problems. A technique that applies these algorithms to explore a multi-dimensional space of parameters associated with a ranking algorithm is generally desired. However, a given ranking algorithm may have an order of 50 to 100 parameters. If each parameter has 10 possible values, there will be 10^50 to 10^100 possible combinations of parameters for the given ranking algorithm. This large space of combinations renders searching of an optimal set of parameters difficult. Moreover, because of the potentially billions of electronic documents located on a network, executing an optimizing algorithm on these billions of electronic documents to identify an optimal ranking algorithm is time consuming. In other words, searching a large index of electronic documents to identify an optimal ranking algorithm having a set of optimal parameters is impractical.
In existing frameworks for identifying an optimal ranking algorithm, the optimizing algorithm utilized to identify the optimal ranking algorithm is usually “hard-coded” in the frameworks. As a result, changes to the optimizing algorithm in such frameworks usually require code-level changes, which are then distributed to other machines if the index of electronic documents is spread across multiple machines. Therefore, a framework that provides interchangeable optimizing algorithms is desired such that an optimizing algorithm may be easily upgraded or substituted.
Accordingly, a solution that effectively evaluates information retrieval ranking algorithms and improves ranking algorithms for information retrieval is desired.