A. Field of the Invention
The present invention relates generally to information retrieval and, more particularly, to automated techniques for ranking documents in response to a search query.
B. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information. Search engines assist users in locating desired portions of this information by cataloging web pages. Typically, in response to a user's request, the search engine returns references to documents relevant to the request.
Search engines may base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to identify links to high quality relevant results based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web documents. Web documents that contain the user's search terms are considered “hits” and are returned to the user.
It may be desirable to rank the hits returned by the search engine based on some measure of the quality and relevancy of the hits. A basic technique for sorting the search engine hits relies on the degree with which the search query matches the hits. For example, documents that contain every term of the search query or that contain multiple occurrences of the terms in the search query may be deemed more relevant than other documents and therefore may be more highly ranked by the search engine. Other factors, such as the closeness of terms (also referred to as distance between the terms) in the document may also be considered. Closeness of terms in this context may be measured simply by counting the number of words in the document occurring between the search terms. In documents such as web pages, however, which may contain complex formatting information, “closeness” of terms in the underlying HTML file may not correlate with the “closeness” of the terms when the document is visually displayed. Accordingly, the performance of search engines that rank documents based on the closeness of the search terms in the underlying documents can suffer.
For search engines, returning relevant and high quality documents in response to a search query is of paramount importance. Accordingly, it would be desirable to improve search engine ranking techniques that consider closeness of terms in a search query in the underlying document.