1. Field of the Invention
The present invention is directed to spider engines and, in particular, to regulating the rate of data retrieval by a spider engine.
2. Related Art
“Web crawlers”, “robots”, or “spider engines” are programs used to automatically search the Internet for web pages or documents of interest. The information found by the spider engine may be collected, cataloged, and otherwise used by search engines. For example, a spider engine may be directed to search for and collect particular types of data, such as product catalog information, or may randomly search and catalog all found web pages to create a web index. The spider engine may enter a particular web site, and search one or more web pages of the web site for information of interest. The web site being searched may maintain a large number of web pages. Hence, searching with a spider engine may entail downloading, via the Internet, hundreds, thousands, and even more pages of information in a relatively short amount of time, from a single web site server.
Searching a web site in this manner with a spider engine may cause a web site server to become heavily loaded with web page requests. A web site server may be physically limited to supporting a particular amount of web page requests at any one time. The loading due to requests from a single spider engine may approach this web page request limit, and impair the web server's ability to respond to other requests for information during this period. This overloading may be detrimental to the web site provider's goal of making information available to interested parties, and may discourage interested parties from visiting the web site because they receive denials of service. Hence, what is needed is a method and system for limiting such web site requests of a web server by a spider engine, while still yielding acceptable search results.