Search engines use web crawlers to understand documents on the World Wide Web (“the web”). Web crawlers are programs that persistently search the web, indexing web sites by their content (e.g., keywords, text, reciprocal links, videos, images, audio, and the like). Because web sites are constantly changing, web crawlers must repeatedly crawl sites to index the freshest content. Repetitively accessing a web site poses problems for the site's owner, however, because servers hosting the site may only be able to service a particular number of users/requesters at the same time. So crawling the site during peak traffic periods (e.g., a site for trading stocks around the opening bell of a particular stock exchange) becomes dangerous for the stability of the site. Balancing the need to index fresh content with the temperamental nature of a site's traffic is a difficult task for modern web crawlers.
The traditional way site owners try to control the rate at which web crawlers access their sites is through an instructional text file called a “robot.txt” file. Robot.txt files indicate the rate web crawlers can access the site (the “crawl rate”) and a delay the web crawler must wait between fetches (the “crawl delay”). Both the crawl rate and delay are pre-determined, static values that consequently do not allow for adjustment based on the site traffic.