1. Field of the Invention
The present invention relates generally to information retrieval, and more particularly to methods and apparatus for efficiently and effectively retrieving hypertext documents on, e.g., the World Wide Web.
2. Description of the Related Art
The wide area computer network known as the Internet, and in particular the portion of the Internet known as the World Wide Web, affords users access to a large amount of information. Not surprisingly, several search engines have been provided into which users can input queries, and the search engines use various schemes to return lists of Web sites in response to the queries, to facilitate the mining of information from the Web. These Web sites generally represent computer-stored documents that a user can access to gain information regarding the subject matter of the particular site.
Typically, like most computer search methods, Web search engines use some form of key word search strategy, in which the term or terms of a user's input query are matched with terms in Web documents in some fashion to return a list of pertinent Web sites to the querying user. It happens, however, that most queries are only one to three words in length and, thus, are usually very broad. This means that a large number of Web sites might contain one or more words of a query, and if the search engine returns all possible candidates, the user might be required to sift through hundreds and perhaps thousands of documents.
Furthermore, it might happen that in response to a query, the Web sites that are most pertinent to the query might not be returned at all. More specifically, a query might use terms that do not appear in the Web sites that are the most pertinent to the query. For example, the term "browser" does not appear at all in the Web sites for two of the currently most popular browsers. Instead, the Web sites use words other than "browser" to refer to the subject matter of the sites. Consequently, these sites would not be returned to a user who inputs the word "browser" to a search engine that uses a simple key word search strategy.
As recognized by the present invention, however, Internet users unconsciously collaborate in searching for, reading through, reviewing, and judging the quality of Web documents. This collaboration is reflected in large part by the compilation of Web pages, in that many if not most Web pages typically describe and point to other pages that are perceived to be high-quality.
More particularly, a Web page points to other Web pages in the form of hyperlinks, which essentially are references in a first document (i.e., a first Web page) to other documents (i.e., other Web pages). A hyperlink affords a user the ability to select immediate access to another Web page by "clicking" on the hyperlink by means of a computer mouse or other pointing and clicking device. As recognized herein, such referring Web pages can be a rich source of terms that have been popularly associated with referred-to Web pages even if the referred-to Web pages do not themselves use the terms. Consequently, these terms can be used to improve Web search query results. The present invention further recognizes that the present principles of effectively diffusing features (in the form of terms) across a reference to a document (such as a hyperlink) are applicable not only to the Web but also to any body of linked documents, such as patents, academic papers, articles, books, emailings, etc.
Accordingly, it is an object of the present invention to provide a method and system for diffusing features across hyperlinks. Another object of the present invention is to provide a method and system for ranking documents in a set of documents in response to a query. Still another object of the present invention is to provide a method and system for finding key words in a set of documents. Yet another object of the present invention is to provide a method and system for finding associations in computer-stored documents between document terms and query topics represented by one or more query terms. Another object of the present invention is to provide a method and system for Web searching that is easy to use and cost-effective.