Ranking is difficult to handle for large numbers of queries and/or large numbers of documents. For systems with only a few queries, preset results can be generated and for systems with only a few documents, the documents can often be ranked in a query-independent ranking, such as by date. However, for most practical systems with open-ended queries and large document sets, ranking must often be done at least in part on the fly, i.e., the ranking of a list of elements is determined and finalized only after the query that would return that list is received. This problem is particularly difficult where the document set searched is documents on the Web (i.e., the collection of documents stored on servers interconnected via the Internet and referred to as the “World Wide Web”), because by some estimates, there are several billion searchable documents on the Web and a typical search might yield thousands or hundreds of thousands of documents and a typical searcher can only deal with at most a few dozen relevant references.
If a specific Web address for a document is known, a user can supply the address (typically a URL, or Uniform Resource Locator) to a browser, which would then use its underlying protocols to quickly obtain that specific document. However, more and more typically, a user does not know exactly where the desired information is, and that is one task that ranking systems can help with. It should be understood that ranking systems do not require a network, but might be used on a single computer or computer system to rank information stored there.
Generally, network nodes on the Internet that connect to other network nodes support a variety of protocols as various network levels. At the application level, many Internet nodes support HyperText Transport Protocol (HTTP) for sending and receiving hypertext pages, which might include HyperText Markup Language (HTML) pages and other data supported by devices and software that handle HTTP messages. HTTP is a client-server protocol, in that a node that acts as an HTTP client makes requests of a node that acts as an HTTP server. When an HTTP client makes a request, the request includes a Universal Resource Locator (URL) that refers to the page or data requested. The URL comprises a globally unique domain name and possibly other fields that are specific to that domain name. Thus, any HTTP client can make a request by sending the request into the network. The network will resolve the domain name and route the request to an HTTP server at the specified domain and that HTTP server will resolve the remaining fields of the URL to determine what was requested.
This approach works well when the HTTP client has a URL for the desired data. However, where the client or the user operating the client does not have a specific URL in mind, searching is usually done to find the resource or resources of interest. Several approaches to searching have been tried and are currently in use. One approach is the directory approach, where large numbers of URLs and references to pages are stored in a hierarchical structure and are searchable or navigable via the structure. An example of this approach is the Yahoo! directory. With the Yahoo! directory, a Yahoo! user directs a browser to a search page and submits a search from that page. The search is applied to the Yahoo! hierarchical (taxonomical) structure and results are presented to the user. The results can also include hits from a search engine searching on the terms of the search.
Such approaches work well to find well-categorized information and information that is not voluminous, but problems occur when the search results can fall into many different topics and/or there are a large number of documents that match the search. With the growth in content volume available over the World Wide Web (the collection of documents accessible over the Internet or similar network using HTTP or the like often including hyperlinks from document to document, thus creating a “web” structure, referred to as “the Web” for brevity), a typical search might yield far more hits than can be processed by the searcher. As a result, the hits in those cases need to be ranked. Ranking allows for the more relevant pages to be presented higher in the ranking than other pages.
Search ranking systems using input from users of the system are known. For example, U.S. Pat. No. 6,078,916 shows a search system wherein the search activity of users is monitored and that activity is used in ranking results of later searches. Thus, the more often users click on one of the hits, the higher that hit is ranked in subsequent search results.
U.S. Pat. No. 6,240,408 shows another approach to search results ranking. In the system shown there, a query is applied to a controlled database containing selected data and an uncontrolled database containing possibly unselected and uncontrolled data. The ranking of results in the controlled database is used to inform the ranking process of ranking results of the query of the uncontrolled database.
Ranking by human editors reviewing search results provides more relevant ranking than automated processes and even search users, because human editors possess better intelligence than the best software and more clearly understand distinctions in pages, and human editors focus on areas of their expertise. For example, a human editor would more easily spot a page that is irrelevant but contains terms designed to get a high ranking from an automated process. However, human editors cannot process the volume of searches typically received by a search system and cannot keep up to date the queries they do process in view of the relevant pages that are added for consideration, modified or removed. In addition, in an open-ended query system, the number of possible queries can easily be in the millions. Even if editors concentrate only on the most common queries, the results change all the time as new data becomes available, old data becomes irrelevant, new meanings are created for old terms, or new events occur. If the results are based solely on what the human editors decided on one day, they might be stale and out of date the next day.