The Internet, often simply called “the Net,” is a worldwide system of computer networks and, in a larger sense, the people using it. The Internet is a public, self-sustaining facility that is accessible to tens of millions of people worldwide. The most widely used part of the Internet is the World Wide Web, often abbreviated “WWW” or simply referred to as just “the Web”. The Web is an Internet service that organizes information through the use of hypermedia. The HyperText Markup Language (“HTML”) is used to specify the contents and format of a hypermedia item (e.g., a Web page).
In this context, an HTML file is a file that contains the source code for a particular Web page. A Web page is the image that is displayed to a user when a particular HTML file is rendered by a browser application program. Unless specifically stated, an electronic or Web item may refer to either the source code for a particular Web page or the Web page itself.
Each page can contain imbedded references to images, audio, or other Web items. A user, using a Web browser, browses for information by following references, known as hyperlinks, that are embedded in each of the items. The HyperText Transfer Protocol (“HTTP”) is the protocol used to access a Web item.
Through the use of the Web, individuals have access to millions of pages of information. However a significant drawback with using the Web is that because there is so little organization to the Web, at times it can be extremely difficult for users to locate the particular pages that contain the information that is of interest to them.
To address this problem, a mechanism known as a “search engine” has been developed to index a large number of Web pages and to provide an interface that can be used to search the indexed information by entering certain words or phases to be queried. Indexes are conceptually similar to the normal indexes that are typically found at the end of a book, in that both kinds of indexes comprise an ordered list of information accompanied with the location of the information. Values in one or more columns of a table are stored in an index, which is maintained separately from the actual database table. An “index word set” of an item is the set of words that are mapped to the item in an index. For items that are not indexed, the index word set is empty.
Although there are many popular Internet search engines, they are generally constructed using the same three common parts. First, each search engine has at least one “spider” that “crawls” across the Internet to locate Web items around the world. Upon locating an item, the spider stores the item's Uniform Resource Locator (URL), and follows any hyperlinks associated with the item to locate other Web items. Second, each search engine contains an indexing mechanism that indexes certain information about the items that were located by the spider. In general, index information is generated based on the contents of the HTML file. The indexing mechanism stores the index information in large databases that can typically hold an enormous amount of information. Third, each search engine provides a search tool that allows users to search the databases in order to locate specific items that contain information that is of interest to them.
The search engine provides an interface that allows users to specify their search criteria and, after performing a search, an interface for displaying the search results. Typically, the search engine orders the search results prior to presenting the search results interface to the user. The order usually takes the form of a “relevance ranking”, where the matching item with the highest relevance ranking is the item considered most likely to satisfy the interest reflected in the search criteria specified by the user.
The specific techniques for determining that ranking vary from implementation to implementation. One factor used by many ranking mechanisms to determine relevance is the “popularity” of a web page. When all other factors are equal, pages that are “popular” are given higher rankings than pages that are visited less frequently.
Ranking mechanisms typically determine the popularity of web pages based on information collected by the search engine. For example, one type of information that can be collected by the search engine is how users use the search engine. Thus, if users of the search engine frequently select a particular link from the search results, then the popularity weight of the corresponding page may go up, therefore giving the page a higher relevance ranking.
Similarly, the spider of a search engine may be used to count the number of links that other pages contain to a particular page. The greater the number of links that point to a page, the more popular the page tends to be, so the pages with more incoming links are considered to have higher relevance than pages with fewer incoming links.
Since the perceived value of a search engine is highly dependent on the accuracy of its relevance rankings, it is clearly desirable to provide techniques for increasing the accuracy of the relevance ranking.
Based on the foregoing, it is desirable to provide improved techniques for improving search relevancy.