1. Field of the Invention
The present invention relates to the field of network security and more particularly to Internet robot detection for network distributable markup.
2. Description of the Related Art
Computing security has increasingly become the focus of information technologists who participate in locally and globally accessible computer networks. In particular, with the availability and affordability of network computing, even within the small enterprise, many small computer networks provide continuous access to private content for global users via network distributable content including Web content and markup in general. Notwithstanding, the efficiencies gained, network computing is not without its price. Specifically, those computers and computer networks which heretofore had remained disconnected from the security risks of the Internet now have become the primary target of malicious Internet hackers, crackers and script kiddies, collectively referred to as “malicious attackers”.
Computing networks incorporate gateway switches to regulate the ingress and egress of information into different segments of the network. Firewall technologies have been deployed in association with gateway switches in order to impede the penetration of a computing network by a malicious hacker. Generally, a firewall inspects incoming packets of data in order to detect patterns of information known to be associated with the activities of a malicious hacker. The patterns can be detected statically by referencing a known table of patterns, or dynamically according to the stateful inspection of packets. Most effectively, firewall technologies can limit the type of traffic flowing through a network domain simply by blocking all ports other than those ports expressly intended to permit unimpeded traffic flow.
While port blocking can be effective for many garden variety attacks, some ports must remain open to allow unimpeded flow of information intended for public dissemination over the global Internet—namely network distributable markup like Web pages. Malicious attacks occur with respect to network distributable markup in a number of ways, including Web page defacement. One particular attack type of concern, however, can be subtle and undetectable, though the consequences can be substantial. Specifically an Internet robot or Web robot, hereinafter a “bot”, has been the vehicle of choice for malicious hackers intent upon proactively collecting information relating to network distributable content and for probing vulnerabilities in a supporting platform.
Strictly speaking, a bot is a computer software application including program code enabled to run automated tasks over a computer communications network like the global Internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human end user alone. For example, Web spidering utilizes bots to retrieve and analyze content from network distributable markup. In a Web spidering bot, an automated script fetches, analyzes and files information from Web servers at many times the speed of a human end user. Web spidering can be beneficial in some circumstances, such as in search engine cataloging of content. To that end, Web servers attempt to manage Web spidering by publishing a robots.txt file that incorporates locally adopted rules for Web spidering to be obeyed by a visiting Web spider. Additionally, a Web spider as part of accepted Internet etiquette publishes its presence within the user agent field of a content request.
The malicious use of bot technology, however, often outweighs the beneficial aspects of Web spidering. In particular, malicious users often use bot technology to harness compromised hosts to flood Web sites with distributed denial of service attacks. Malicious users additionally use bot technology to repetitively send requests to a Web site using forged referrer headers in order to create trackback links intended to inflate the search engine rankings of the Web site. Malicious users yet further use bot technology to generate automated click-throughs for online advertisements to boost affiliate revenue. Finally, malicious users use bot technology to harvest e-mail addresses for use in the transmission of unsolicited commercial e-mail, e.g. “spam”.