The core of the World Wide Web (WWW) comprises several billion interlinked web pages. Accessing information on almost any of these web pages would be essentially impossible without the aid of systems that enable a user to search for specific text, or textual identifiers. Indeed, such systems, generally known as “search engines,” have increased in popularity as the WWW has grown in size.
Traditionally, a search engine comprises an initial page providing the user with a mechanism for entering one or more words, characters, or phrases, known as the “search string” or the user's “query.” The search string represents the words, characters or phrases that the user wishes to find in one or more of the web pages that comprise the WWW. The search engine will then reference a database comprising the content of a myriad of web pages, seeking to identify one or more web pages that contain the search string that the user entered. More advanced search engines can also apply common linguistic permutations to the words or phrases that the user is searching for in an effort to provide a more complete result. Thus, if the user searched for the word “computers,” the search engine could also search for the singular form “computer,” or even the verb form “computing,” so as not to exclude, for example, a web page referencing a single “computer” but not multiple “computers.”
To generate a database which can be searched for the user's search string, search engines commonly employ automated processes known as “crawlers” to read information from a web page, follow the links in the web page to other web pages, read information from those web pages, and so forth. In such a manner, the crawler traverses web pages of the WWW in an orderly manner, returning the information from the web page back to the search engine for storage. The search engine then stores the information in an optimized format to reduce the amount of storage space used, and to improve searching efficiency.
Due to the sheer volume of information and content available on the WWW, all but the most obscure search strings can result in thousands, and even millions of web pages identified by the search engine as comprising the entered search string. Generally, most users will only review the first few results, continuing beyond those only if they were not able to find anything in those first few results that was of interest to them. Consequently, search engines generally display search results such that the results with the highest ranking appear near the beginning of the listing. The web pages that are displayed near the beginning of a search result listing are, therefore, most likely to have visitors directed to them from the search engine.
Because an increased number of visitors can provide benefits, both financial and otherwise, having web pages listed near the beginning of a search result listing can be desirable. However, because the content that may cause a web page to be highly ranked can differ from the content that a web page author would wish to present to visitors, a first web page can be designed to be ranked highly by a search engine and to be relevant to a wide range of queries. However, when actually browsed to by a browser, that first web page could instead direct the browser to display a second web page whose content need not be limited to that which will receive a high ranking. Indeed, the second web page may even comprise inappropriate or malicious content which would have been excluded by the search engine. Additionally, the first and second web pages do not need to share a common heritage, nor do such links need to be made in advance. For example, the author of a page that has become highly ranked could sell redirections from that page to multiple other web page authors, such that visitors to the first page are redirected randomly to the web pages of those other web page authors.
To maintain accuracy, search engines can attempt to detect if one web page will redirect to another. Such redirections may not always be inappropriate or malicious. For example, redirections can be used to automatically direct visitors to equivalent content, except presented in the visitor's native language. Similarly, redirection can also be used to achieve load balancing, thereby providing visitors with the same content, except in a more responsive manner. Thus, in addition to merely detecting if one web page will redirect to another, search engines can also attempt to determine the content of the redirected page. If appropriate, the search engine can index, not the content of the first web page, which will never be seen by a user browsing there anyway, but rather the content of the page that is the target of the redirection. By indexing such content, however, the search engine can effectively nullify the redirection. To avoid having the search engine detect that one web page redirects to another web page, a web page author can utilize various script-based mechanisms which cannot be easily, or efficiently, detected by a search engine.