A typical search engine uses a search index to facilitate fast retrieval of information from hosts on the Internet. For example, a search index may include a portion of some content from a particular website or the entire contents of the website. A user sends a search request to the search engine and the search engine then performs the requested search on the local copy of the content in the search index rather than retrieving content from websites in real time. Performing the search on the local copy of the content is generally faster than retrieving the content from the website in real time and then performing the search on the retrieved content.
Unfortunately, the Internet is not a stable place. Websites come and go, and content within a website can change over time. Therefore the search index must be updated periodically. A search system generally includes a web crawler that updates the content in the search index. The web crawler can retrieve content from websites already included in the search index to determine if changes, additions, or deletions have been made. In some cases, a web crawler can receive lists of websites to add to or remove from the search index, such as from an administrator of a website. In some cases, a web crawler can traverse hyperlinks in visited websites to determine new websites to add to the search index.