The World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly. Search engines assist users in locating desired portions of this information by cataloging web pages. Typically, in response to a user's request, the search engine returns references to documents relevant to the request.
A web crawler is typically used to find and retrieve documents (e.g., web pages) on the web. To retrieve a document from the web, the web crawler sends a request to, for example, a web server for a document, downloads the entire document, and then provides the document to an indexer. The indexer typically takes the text of the crawled document, extracts individual terms from the text and sorts those terms (e.g., alphabetically) into a search index. The web crawler and indexer repeat this process as the web crawler crawls documents across the web. Each entry in the search index contains a term stored in association with a list of documents in which the term appears and the location within the document where the term appears. The search index, thus, permits rapid access to documents that contain terms that match search terms of a user supplied search query. To improve search performance, the indexer typically ignores common words, called stop words (e.g., the, is, on, or, of, how, why, etc.) when creating or updating the search index. Existing indexers create a single search index that contains terms extracted from all documents crawled on the web.