The Internet is a highly-distributed computer network that connects computers all over the world. One way to classify the computers of the Internet is as client computers and server computers. Operators of the server computers provide "Internet" services and products to users of the client computers. The different types of client and server computers are too numerous to detail here.
Providers of Internet services may want to restrict access to their servers only to human users. That is, the providers would like to deny accesses made by automated "agents" operating on behalf of users. An agent is some software program, or script generator that can mimic user accesses. It is well known on the Internet that many agents are intentionally designed to behave in a malicious, destructive, or otherwise annoying "anti-social" manner. Therefore, service providers would like to deny access by agents.
One reason for doing this is fairness. Automated agents can generate service requests at a rate thousands of times greater than a normal user. Therefore, it is quite possible that one agent can monopolize a particular service at the expense of the unassisted users. Fairness is particularly important if the provider is running a lottery, or conducting a popularity contest or a poll that allows a user to make multiple entries. As a real example, computer-generated entries in most sweepstake contests are now banned because of an incident where an agent on behalf of a contestant generated enough entries to claim a substantial portion of the available prizes.
Another reason is advertising revenue. On the Internet, advertising revenue is usually based on the number of times that advertisements are displayed when service requests are made. Unlike displaying the advertisement to a user, displaying the advertisement to an automated agent has no value. Consequently, useful advertising impact is better estimated when accesses by automated agents are denied.
Yet another reason is "spamming." On the Internet, spam is the term used to describe useless electronic messages (e-mail). There, a spamming agent, usually at a very low cost, sends a message to a large number of users. Typically, the "spam" is of narrow interest. The hope of the spammer is to make a profit even if only a small fraction of the recipients respond. On the Internet, spamming agents are generally considered counter-productive because processing spam wastes network resources and people's time. Therefore, suppressing spam generated by agents can save substantial resources.
A variant of spam arises in the context of Web search engines, such as Digital Equipment Corporation's AltaVista search engine. Search engines maintain full word indexes of Web pages. Users submit queries to locate Web pages of interest. In the case where many Web pages satisfy the query, the result set of Web pages is rank ordered according to some weighted frequency metric.
Search engines are subject to abuse, in particular by electronic agents. For instance, an electronic agent may request the search engine to index many useless or deceptive Web pages to boost the visibility of a particular topic. For example, the agent could use AltaVista's "Add-URL" facility to add pages to its index. Although "page-boosting" cannot be entirely eliminated because users can always submit individual pages one at the time, denying access to agents will reduce this abuse to a manageable trickle.
Agents should also be denied access to proprietary information. For example, a server might maintain an on-line encyclopedia, or an online collection of web pages such as the Yahoo service. Providers of such services would like to eliminate improper access to their proprietary information because an agent could otherwise easily obtain a large percentage of a database and establish a competing service.
In all of these cases, it is difficult for the server computers to differentiate requests submitted by users from those generated by an agent, otherwise agents would not be a problem.
On public telephone systems, a similar, although smaller problem exists. There, telemarketing services have used automated dialers, and tape-recorded messages to mass market products and services to consumers. In this highly regulated setting, laws have been passed banning machine-generated telemarketing calls. While this approach has worked well for telephone networks, it is unlikely to work as well in the context of the Internet and the Web because they have a number of characteristics that make it hard to effectively apply legal sanctions.
First, it is very difficult to trace a service request back to its true source, physically as well as electronically. On the Web, it is very easy to start-up a Web site, and then to abandon it after it has been exploited. On the Web, it is not fly-by-night, but fly-by-seconds. In addition, enforcement of the laws would be extremely difficult, and perhaps not worthwhile. On the basis of an individual user or provider, damages can only measured in terms of the time it takes to dispose of unwanted spam e-mail, or the loss of small incremental amounts of advertising revenue, e.g., cents, or fractions thereof. Second, the Web and the Internet operate on a global basis. Legally barring automated agents would require the cooperation of all countries, an unlikely to occur event.
In the prior art, some attempts have been made at recognizing and eliminating spam. However, almost all of the prior art methods work only for specific contexts of a particular service, and are not generally applicable to any type of Web server. We are aware of one prior art method that is applicable to any type of Web server.
Digital Equipment Corporation offered a Web service that collected and displayed polling data during the primary elections of October 1996. In the design of this service, there was a concern that the same person could enter an opinion into the poll many times, particularly in cases where many requests came from the same network address. As a precaution, the service displayed an American flag in a random position on the screen, and then required the user to click on the flag before entering an opinion. Thus, a person could not quickly enter an opinion many times. However, it is easy to write a program that recognizes the American flag and simulates a click; therefore, this method does not effectively restrict access by electronic agents.
Therefore, there is a need for a server computer to be able to distinguish an ordinary user from an automated agent so that access by the agent can be denied, and while still permitting access to real human users.