The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Web sites are typically comprised of a plurality of inter-linking web pages, each containing public and proprietary content to be presented to a user. Proprietary content may include content created by a web site administrator, content provider, and/or user. Proprietary content may include private, sensitive, and/or personally identifying information about a person and/or entity.
A user may traverse through web pages in a web site by selecting links on various web pages. For example, on a social networking web site, a user may select a link to a first friend's profile page to see content related to the first friend. The first friend's profile page may contain a link to a second friend, and the user may select that link to view the second friend's profile page.
Unfortunately, fraudsters, spammers, data miners, and others may use bots to traverse web sites and collect the proprietary content presented in each page. A bot may be a computer and/or software executed by a computer that automates making requests for page and storing data in the returned pages. For example, a bot may be a web scraper, web crawler, automatic web browser, and/or any other tool designed to submit and/or receive data from one or more web servers autonomously and/or automatically. A bot may comprise complex logic designed to respond to data received from one or more web servers.
Malicious users may use bots to commit many types of unauthorized acts, crimes or computer fraud, such as content scraping, ratings manipulation, fake account creation, reserving rival goods attacks, ballot stuffing attacks, password snooping, web site scraping attacks, vulnerability assessments, and stack fingerprinting attacks. As a specific example, a fraudster may cause a bot to traverse through pages of a web site and collect data, such as who is connected with whom on a particular social networking web site. In the current example, the bot may generate a social graph, following links from one person's profile page to a connected, second person's profile page. The bot may also collect personal information from each person's profile page.
Some web crawlers, however, are not malicious. For example, search engines use web crawlers to find and index web pages on the Internet. An administrator of a web site may want search engines to index and link to various public pages on the web site. For example, an administrator of a web site may want web crawlers to find and index the web site's home page, help page, press release page, and/or other public pages.
An administrator may wish to prevent malicious users from attacking the site, while allowing legitimate users and search engines full and/or partial access to both public and/or proprietary data. However, determining whether the request for a page is a legitimate request or malicious attack may be difficult.