1. Field of the Invention
The present invention relates generally to methods and systems to serve data over a network, and in particular to automatically generate tests to distinguish human users from computer software agents in a communications network.
2. Background
CAPTCHA stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart”. A CAPTCHA is a test that can be automatically generated, which most human can pass, but that current computer programs cannot pass. CAPTCHAs have been used to prevent malicious third parties from using automated means (“bots”) to perform actions that are intended specifically for humans, such as account registration, service provisioning, bill payment, and so forth. Excessive use of such services by bots leads to decreased quality of service for the given system, as well as problems involving fraud, identity theft, and unauthorized commercial promotion (“spam”).
For example, some online businesses offer free online services such as email, online storage, search engines, forums, and the like. Difficulties arise for these businesses when bots are used to send large numbers of requests to the service over a short period of time. Similarly, email service providers suffer when bots are used to sign up for large numbers of email accounts which are later used to send junk emails.
One way of thwarting bots is to implement CAPTCHA in such a way that a user is required to solve a task prior to being allowed access to services intended specifically for humans. Currently there exists a variety of CAPTCHA implementations that provide different types of tasks. Types of tasks include: text recognition, image recognition, and speech recognition.
GIMPY and EZ-GIMPY are two of many CAPTCHA implementations based on the difficulty of reading distorted text. GIMPY works by selecting several words out of a dictionary and rendering a distorted image containing the words. GIMPY then displays the distorted image, and requires the human user to input the words in the image. Most humans can read the words from the distorted image, but current computer programs cannot. The majority of CAPTCHAs used on the Web today are similar to GIMPY in that they require the user to correctly identify some content in a distorted image. Unfortunately, however, this implementation requires the system to keep lists of words in one or more languages, and requires the human to be literate in one of the languages for which the system has a list of words.
Another CAPTCHA implementation is PIX. PIX is an imaged-based CAPTCHA implementation that utilizes a large database of labeled images. All of the pictures stored in the database are pictures of well known objects, such as a horse, a table, a flower, etc, each of which are labeled with the appropriate name of the object. PIX picks an object label at random (e.g., “horse”), finds six images of having that object label from its image database, and presents the images to a user. The user must then input a label that correctly matches the known label for the object. Similar to GIMPY, this implementation requires the system to keep lists of words for each picture in one or more languages, and requires the human to be literate in one of the languages for which the system has a list of words.
The underlying assumption of these CAPTCHAs implementations is that current image recognition algorithms run by computer software agents cannot match human performance in identifying the content of images. But many image recognition algorithms are becoming increasingly sophisticated. For example, some bots take advantage of the vast corpus of images available on the Internet to serve as a basis for “training” image recognition algorithms to defeat current CAPTCHA implementations. Further, because CAPTCHAs are ultimately designed by human programmers with varying levels of skill, bots are able to defeat poorly designed CAPTCHAs. In sum, many existing CAPTCHAs are not well implemented and are easily broken by bots. CAPTCHA designers typically respond to this threat by making the tests increasingly difficult. Unfortunately, this often results in a CAPTCHA test that is too difficult for many human users to consistently pass.