1. Technical Field
This invention is directed toward a system and method for determining whether a computer user is a human or a computer program. More specifically, the invention is directed toward a system and method for devising a Human Interactive Proof that determines whether a computer user is a human or a computer program.
2. Background Art
Web services are increasingly becoming part of people's everyday life. For example, free email accounts are used to send and receive emails; online polls are used to gather people's opinion; and chat rooms are used to permit online users to socialize with others. However, all these web services designed for human use are being abused by computer programs (bots). A bot is any type of autonomous software that operates as an agent for a user or a program or simulates a human activity. There are various types of bots. On the Internet, the most popular bots are programs (called spiders or crawlers) used for searching. They access web sites, retrieve documents and follow all the hyperlinks in them; then they generate catalogs that are accessed by search engines. A chatbot converses with humans (or other bots). A shopbot searches the Web to find the best price for a product. Other bots observe a user's patterns in navigating a web site and customize the site for that user. Knowbots collect specific information from websites.
Many of the aforementioned types of bots are being maliciously used. For example, Hotmail, Yahoo and others provide free email services. Unfortunately malicious programmers have designed bots to register thousands of free email accounts every minute so that they can send thousands of junk emails. Online polling is a convenient and cost-effective way to obtain people's opinions. However, when these on-line polls are abused by bots, their credibility reduces to zero. In the information age, people use online chat rooms to socialize with others. However, bots have started to join chat rooms and point people to advertisement sites. In the case of E-commerce, a malicious programmer can design a bot whose task is to aggregate prices from other E-commerce sites. Based on the collected prices, the malicious programmer can make his or her price a little cheaper, thus stealing away other sites' customers. Similar situations arise with search engine sites.
Presently there exist several Human Interactive Proof (HIP) algorithms that determine whether a computer user is a human or a bot. For example, there are several programs that can generate and grade tests capable of being passed by humans, that are beyond the capabilities of many computer programs. One such program, named Gimpy, picks seven random words out of a dictionary, distorts them and renders them to users. The user needs to recognize three out of the seven words to prove that he or she is a human user. Because words in Gimpy overlap and undergo non-linear transformations, they pose serious challenges to existing OCR systems. However, they also pose a significant burden on human users. This burden was so much that Yahoo pulled Gimpy from its website and replaced it with an easier version, EZ Gimpy. EZ Gimpy shows a user a single word over a cluttered background. Another program, Bongo, presents a user with two groups of visual patterns (e.g., lines, circles and squares), named LEFT and RIGHT. It then shows new visual patterns and asks the user to decide if the new patterns belong to LEFT or RIGHT. There are some programs, for example Pix and Animal Pix, that rely on a large database of labeled images. They first randomly pick an object label (e.g., flower, baby, lion, etc.) from the label list, and then randomly select images containing that object from the database, and show the images to a user. The user needs to enter the correct object label to prove he or she is a human user. In addition to the above visual HIP designs, there also exist audio challenges, e.g., Byan 0 and Eco 0. The general idea is to add noise and reverberation to clean speech such that existing speech recognizers can no longer recognize it. The audio challenges are complementary to the visual ones and are especially useful to vision-impaired users.
The aforementioned HIP systems suffer from various deficiencies in ease of use, resistance to attack, dependency on databases, and lack of universality. For instance, some of these HIP tests are cumbersome and time-consuming for a human to take. Some of these methods employ techniques that have not been investigated extensively and are based on technologies that are evolving which could impact the usability of these tests in the future. Furthermore, some of the tests are dependent on the user's language, physical location, and education, among others, and are therefore not universal. It is expensive for such companies to localize a HIP test to numerous different languages. Additionally, some of the tests are not resistant to no-effort attacks. No-effort attacks are the ones that can solve a HIP test without solving a hard artificial intelligence (AI) problem. As an example, Bongo is a two-class classification challenge. To attack Bongo, the attacker needs no effort other than always guessing LEFT. This will guarantee the attacker to achieve 50% accuracy. Even if Bongo can ask a user to solve 4 tests together, that still gives no-effort attacks 1/16 accuracy. Some of the aforementioned tests are also easy to attack when the database they use is publicized. For example, both Pix and Animal Pix would be very easy to attack once the database is publicly available. They, therefore, are not good HIP tests. The evaluations of some of the existing approaches are summarized against these factors in Table 1. From Table 1, it is clear that most of the existing HIP algorithms suffer from one or more deficiencies.
TABLE 1Evaluation of Existing HIP Tests.Automation3. Hard to5. Resistance to no-6. Robustness whenGuidelinesand gradability2. Easy to humanmachine4. Universalityeffort attacksdatabase publicizedGimpyYesYesYesNoYesYesBut the partiallyPeople whooverlapped textknow Englishcan be hard tohave much morerecognize 0advantagesEZ GimpyYesYesNoYesYesNoIt has beenHas only 850 words 0broken 0BongoYesYesYesYesNoYesA machine canrandomly guess ananswerPixYesYesYesNoYesNoBut the labels canSome objects doWith the database, itbe ambiguousnot exist in somebecomes simple image(cars vs. Whitecountries.matching.cars)Animal PixYesYesYesNoNoNoSome animals areA machine canWith the database, itonly popular in arandomly guess anbecomes simple imagefew countries.answermatching.PessimalYesYesYesNoYesNoPeople whoHas only 70 words 00know Englishhave much moreadvantagesBaffleTextYesYesYesYesYesYesBut has beenBut people whoattacked whenknow Englishusing single font 0may haveadvantagesByanYesYesYesNoYesYesUsers need toknow English
Human faces are arguably the most familiar objects to humans, rendering them possibly the best candidate for a HIP. Regardless of nationalities, culture differences or educational background, all humans recognize human faces. In fact, this ability is so good that humans can recognize human faces even if they are distorted, partially occluded, or in bad lighting conditions.
Computer vision researchers have long been interested in developing automated face detection algorithms. These face detector algorithms could conceivably be used to attack a HIP test that employs a face. In general, face detection algorithms can be classified into four categories. The first is the knowledge-based approach. Based on people's common knowledge about faces, this approach uses a set of rules to perform detection. The second approach is feature-based. It first detects local facial features, e.g., eyes, nose and mouth, and then infers the presence of a face. The third approach is based on template matching. A parameterized face pattern is pre-designed manually, and then used as a template to locate faces in an image. The fourth approach is appearance-based. Instead of using pre-designed templates, it learns the templates from a set of training examples. So far, the fourth approach is the most successful one 0.
However, in spite of decades of hard research on face and facial feature detection, today's best detectors still suffer from limitations relating to lighting, face symmetry, shading and cluttered backgrounds. Here are some example limitations:
1. Head Orientations. Head orientation often causes problems for face and feature detectors. Let the x axis point to the right of the paper, the y axis point to the top of the paper, and the z axis point out of the paper. All face detectors handle frontal faces well. That is, they work well when there is no rotation around any of the three axes. They can also handle rotations around the y axis to some extent, but their performance is worse than for detecting frontal view faces. They do not handle rotations around the x and z axes well.
2. Face Symmetry. With respect to face symmetry, face detectors assume, either explicitly or implicitly, that the faces are symmetric, e.g., the left eye and right eye are roughly of the same height, and are roughly of the same distance from the nose bridge. Problems can occur in detecting faces when this is not the case.
3. Lighting and Shading. Face detectors rely on different intensity levels of landmarks on human faces. For example, they assume that the two eyes are darker than the surrounding region, and the mouth/lip region is also darker than the rest of the face. When a face image is taken under very low or high lighting conditions, the image's dynamic range decreases. This in turn results in difficulties in finding the landmark regions in faces. In addition, lighting also creates shading which further complicates face detection.
4. Cluttered Background. If face-like clutters exist in the background of the face image, the face detectors can be further distracted. The above four conditions are provided as limitations employed by one embodiment of the invention, however, other limitations could equally well be used.
Therefore, what is needed is a system and method that can create a human interactive proof that can consistently and correctly distinguish a human computer user from a bot. Such a system should preferably provide for ease of use, resistance to attack, universality and not depend on a database.