Internet search engines are sites on the World Wide Web (the web) that are designed to help people find information stored on other sites. While there are differences in the ways that various search engines work, they all perform three basic tasks. First, they periodically crawl the web, or selected portions of the web, for important documents. Second, they store all of the important words (keywords) used in the documents and where the words are found. Third, they employ some type of ranking algorithm that attempts to rank web documents by relevance to a user's search term or combination of search terms when the user submits a query to the search engine.
To find information on the billions of web pages that exist, search engines employ software robots, called spiders or web crawlers, to build lists of the words found on web sites. Typically, a spider will begin its search on heavily used web servers and on popular web sites, storing words from the web sites' pages and following every hyperlink found within the site. In this way, the spider software quickly spreads across the most widely used portions of the web. Different spiders use different strategies or combinations of strategies for collecting web page information. For example, some spiders may look at every word on a web page and where the word is located (e.g., in the title, in sub-headings, in the first 20 lines, etc.). Other spiders may keep track of the most frequently used words in the page and/or the words used in each hyperlink. Still others may collect meta-tags, which are keywords under which the web page owner wants the page to be indexed.
Once the spiders have gathered a sufficient amount of web page data (the task is never actually completed due to the dynamic nature of the web), the search engine must store the information in a way that makes it useful to web engine users. There are two key components involved in making the gathered data accessible to users: the information stored with the data and the method by which the information is indexed. For example, a search engine might store the number of times a word is used in a web page and assign a relevance score to the web page based of the count. A search engine might also assign a weight to a search term based on its location in a web page, with increasing values as they appear near the top of the page, in-sub-headings, in links, in meta tags or in the title of the page, for example.
Regardless of the particular method used to rank web pages in response to a user's query, this approach to search engine design has several significant drawbacks. First, the ranking algorithms are biased toward older pages because there are usually more links pointing to an older page than there are links pointing from the older page. Second, the ranking method is self-reinforcing because pages that are highly ranked will be linked to by more users, which will increase the ranking. This “all roads lead to Rome” phenomenon can continue even after the web page is inactivated (becomes a dead page), because the sheer size of the web prevents any search engine with a single point of view (i.e., the search engines web crawler) to cover the entire web in a short period of time. Third, user navigation to web pages without the use of hyperlinks (e.g., by entering a URL directly into a web browser) is not part of the search engine ranking calculus.