The amount of information and content available on the Internet continues to grow rapidly. Given the vast amount of information, search engines have been developed to facilitate searching for electronic documents. In particular, users may search for information and documents by entering search queries comprising one or more terms that may be of interest to the user. After receiving a search query from a user, a search engine identifies documents and/or web pages that are relevant based on the search query. Because of its utility, web searching, that is, the process of finding relevant web pages and documents for user issued search queries has arguably become the most popular service on the Internet today.
Search engines operate by crawling documents and indexing information regarding the documents in a search index. When a search query is received, the search engine employs the search index to identify documents relevant to the search query. Use of a search index in this manner allows for fast retrieval of information for queries. Without a search index, a search engine would need to search the corpus of documents to find relevant results, which would take an unacceptable amount of time.
As the Internet continues to grow, the number of searchable documents that may be crawled and indexed in a search index has become extremely large. As a result, it has not been feasible for search engines to index information regarding all web documents. For instance, an inordinate amount of hardware storage would be required. Additionally, the processing time required to retrieve results from an extremely large index would be unacceptable. Nonetheless, search engines strive to index as many documents as feasible to provide search results to any query while being cost-effective and able to provide relevant results in amount of time that is acceptable to end users.