In the World Wide Web, information and resources are typically organized as Web pages. To locate desired information and resources on the Web, a user typically employs a search engine to search for relevant Web pages. Typically, a search engine searches a database that contains content-based information about pages on the Web. This content-based information is usually gathered by Web crawlers that periodically browse through the Web in a systematic manner. When a search engine receives a query with certain search terms, the search engine searches a Web information database, looking for Web pages with content-based similarity to the search terms. The search engine then returns the addresses of these Web pages to the user.
As the Web continues to grow, it becomes increasingly challenging for users to accurately locate pages on the Web. For example, a query may result in an unreasonably large number of Web pages where many of these pages are not relevant to the query. Some existing search engines attempt to alleviate this problem by presenting the search results to the user in an order that is based on the importance of the Web pages returned by the search. In the database used by these existing search engines, each Web page is ranked according to hyperlinks that point to that Web page in all of the other Web pages in the database. In other words, a hyperlink pointing to a Web page serves as a vote for that page. Each Web page is ranked according to the number of votes received by the page.
Although search engines that return ranked Web pages produce a better user experience, these search engines also have some serious shortcomings. For example, since most pages on the Web have very few hyperlinks or no hyperlink pointing to them, ranking Web pages based on hyperlinks produces a polarized and unrealistic distribution of importance. Also, since new hyperlinks have to be authored into Web pages, which require a significant amount of time, new pages may not receive the rankings that reflect their importance.
Thus, there is a need for a search engine that is capable of distributing the importance of Web pages in a realistic manner and more accurately accounting for new pages on the Web.