A. Field of the Invention
The present invention relates generally to search engines and, more particularly, to systems and methods that use text surrounding hypertext links to improve search results.
B. Description of Related Art
The World Wide Web (“Web”) contains a vast amount of information. Locating a desired portion of information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly.
Search engines attempt to return links to web pages in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to identify links to high quality relevant results for the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web pages. Web pages that contain the user's search terms are considered “hits” and are returned to the user.
Search engines generally organize their corpus of pre-stored web page documents as an inverted index of terms found in the web pages. Terms in a search query can be quickly referenced against the index to determine the set of documents that contain some or all of the terms. In one technique for improving the quality of a document index, additional terms found near hyperlinks in documents are used to enhance the description of the linked document. The premise of this technique is that web authors tend to describe or comment about the content of other web pages in descriptive text located near the link to the other web page. This descriptive text may be used when indexing the linked document to enhance the quality of the index. As a concrete example of this technique, consider a first web page that includes a hyperlink to a target web page dealing with data compression techniques. The first web page may describe the target web page with the descriptive text “basic facts, algorithms, hardware links, and a glossary.” By including the descriptive text of the first web page in the text of the target web page when indexing the target web page, the search engine can generate a more comprehensive document index.
In a variation of the above technique, the descriptive text may be used when returning search results to a user. The idea here is that the descriptive text, in many situations, accurately summarizes the linked web page. Accordingly, in response to a search query, the search engine may return the list of relevant web pages along with corresponding descriptive text that was gathered from pages that link to the web page.
One problem associated with using descriptive text from a web page to evaluate a linked web page is that there are often multiple linking web pages, and thus multiple samples of descriptive text from which to choose. Automatically choosing the best sample of descriptive text can be a difficult task.
The overriding goal of a search engine is to return the most desirable set of links, with a succinct and accurate description of each link, for any particular search query. To this end, it is desirable to improve the quality of any external descriptive text associated with a particular web page.