CAPTCHA (Completely Automatic Public Turing Tests To Tell Humans And Computers Apart) systems are well known in the art. Examples of such are used by Yahoo! (Gimpy type), Xerox PARC (Baffle type); so-called Bongo, Pix and Pessimal types are also known in the art. One of the first such visual based systems is described in U.S. patent application Ser. No. 10/790,611 to Reshef, which is hereby incorporated by reference herein.
Generally speaking, the goal of visual based CAPTCHAs is to present an optical image which is only decipherable/comprehensible by a human. To this end, the bulk of these systems rely primarily on some combination of pseudorandom letters and numbers which are placed in front of an obfuscating background, or subjected to visual degradation to make them machine-unrecognizable. A good background on such technologies can be found in the article “Is it Human or Computer? Defending E-Commerce with Captchas,” by Clark Pope and Khushpreet Kaur in IT PRO, March-April 2005, p. 43-49, which is hereby incorporated by reference herein. An example of a typical CAPTCHA of the prior art is shown in FIG. 6. The person looking at the image presented would have to determine that the text shown corresponds to the characters “84EMZ.”
An article entitled What's Up CAPTCHA?—A CAPTCHA Based On Image Orientation by Gossweiler et al. incorporated by reference herein makes use of social feedback mechanisms to select appropriate challenge materials for visual CAPTCHAs. The integration of aggregated human feedback allows for better selection of CAPTCHAs that are best optimized for discriminating against machines.
Recently, however, several sophisticated machine vision systems have achieved significant success in “breaking” the conventional optical CAPTCHA systems. For an example of such system, see “Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA” by Mori and Malik, also incorporated by reference herein and which is available at the University of California Berkeley Computer Science Department website. Thus, traditional forms of CAPTCHA appear to be at risk of becoming obsolete before they gain widespread adoption.
Audio CAPTCHAs are also known in the art. For an example of such system please see the above article to Pope and Kaur, page 45. Generally speaking, these types of systems take a random sequence of recordings of words, numbers, etc., combine them, and then ask the user to input—via keyboard or mouse—whatever is “heard” by the user into the system to determine if the message is comprehended. A drawback of this approach, of course, is that speech recognizers are improving rapidly; an article by Reynolds and Heck entitled “Automatic Speaker Recognition Recent Progress, Current Applications, and Future Trends” presented at AAAS 2000 Meeting Humans, Computers and Speech Symposium 19 Feb. 2000—incorporated by reference herein—makes it clear that machines are in fact better analyzers and recognizers of speech than are humans at this point. Consequently, audio CAPTCHAs of this type are similarly doomed to failure at this point.
The Reynolds et al article also notes that speech verification systems are well-known in the art. These systems are basically used as a form of human biometric analyzer, so that a person can access sensitive information over a communications link using his/her voice. A voice print for the particular user is created using a conventional Hidden Markov Model (HMM) during an enrollment/training session. Later when the user attempts to access the system—for example, in a banking application the user may wish to transfer funds from an account—the system compares certain captured audio data from the user against the prior recording to see if there is a sufficiently close biometric match. Identities are typically confirmed by measuring such intrinsic personal traits as lung capacity, nasal passages and larynx size. Again, since speech recognizers are extremely accurate in evaluating speech data, a very reliable verification can be made to determine if the identity of the person matches the prior recorded voice print. Speaker verification systems are well-known and are disclosed, for example in such references as U.S. Pat. Nos. 5,897,616; 6,681,205 and Publication No. 20030125944 which are incorporated by reference herein.
Another article by Shucker—Spoofing and Anti-Spoofing Measures, Information Security Technical Report, Vol. 7, No. 4, pages 56-62, 2002 explains that these verification systems are very hard to fool with tape recording equipment and the like, because such systems cannot duplicate the physical characteristics noted above. Thus, some speaker-verification technology has ways of testing for “liveness.” They specifically analyze for acoustic patterns suggesting that the voice has been recorded using a process called anti-spoofing. Another application of this technique for fingerprinting is also described generally in U.S. Pat. No. 6,851,051 to Bolle et al. which is incorporated by reference herein. Other biometric techniques for uniquely differentiating humans apart are disclosed in US Publication No. 20050185847A1 to Rowe which is also incorporated by reference herein.
To date, therefore, while verification systems have been used for distinguishing between humans, they have been designed or employed on a limited basis for the purpose of distinguishing between a computer speaking and a human speaking as part of a CAPTCHA type tester/analyzer. This is despite the fact that a recent article entitled “The Artificial of Conversation” published at: http://htmltimes(dot)com/turing-test-machine-intelligence(dot)php implies that conventional Turing tests do not even bother examining computer system vocalizations since they are too difficult.
A recent article entitled “Accessible Voice CAPTCHAs for Internet Telephony” by Markkola et al. incorporated by reference herein describes a Skype challenge system that requires the user to speak a number of random digits. This illustrates that there is known value in using spoken CAPTCHAs.
Some recent filings by Raiakumar (US Publication No. 20070280436, 20070282605 and 20060248019) also incorporated by reference herein also discuss the use of a voice database for registering the names of known fraudsters. Thereafter when a person attempts access the system can detect whether the person calling is already registered and is therefore blocked based on his/her voiceprint.
A further filing by Maislos et al. (US Publication No. 20090055193) (Ser. No. 12/034,736) is also incorporated by reference herein. The Maislos system—while purportedly using voice to differentiate between humans and computing systems, and even different demographic groups—is only recently filed and does not contain many details on how to optimize such discrimination, or how to formulate appropriate challenges. Another company identified as Persay is also believed to be researching voice based CAPTHCA systems; see e.g. www(dot)persay(dot)com and accompanying literature for their SPID system.