In networked environments, such as the internet, authentication is a critical component in preventing abuse and unauthorized access to services.
One aspect of authentication involves distinguishing a human from an attacking automated process (i.e. a software robot). A variety of undesirable behaviors can be propagated by such software robots including: controlling compromised client computers (“botnets”), sending SPAM messages, inflating per click advertising selections on websites, posting unwanted messages to message boards, and creating masses of free email or web page accounts in order to provide easily discarded and anonymous platforms for illicit activities (such as “phishing” used for identity theft), and others.
An authentication test for distinguishing a human from an attacking automated process should be one which most humans can solve, easy for server machines administering the test to generate and evaluate, but difficult for a computer running an attacking automated process to solve accurately.
Current methods for distinguishing a human from an attacking automated process include CAPTCHAs (Completely Automated Public Turing Tests to tell Computers and Humans Apart) and HIP (Human Interactive Proof). The most prevalent method involves generating an image with alphanumeric text that is obscured or distorted such that it is difficult for an attacking automated process to decipher but still legible to humans. This method has proved susceptible to exploitation, since automated character recognition software has become increasingly sophisticated. The threshold at which the text remains easily legible to humans is being crossed in an effort to defeat automated character recognition software. Part of the problem lies in the fact that the text characters must remain within constrained bounds in order to be legible, and this narrows the problem domain for the character recognition software. Furthermore, alphanumeric CAPTCHA methods suffer from challenges in representing multilingual character sets.
Another form of CAPTCHA use selected images from a large database of manually classified images that a user must identify. A problem with this method is that an attacker need only create a dictionary of known photographs being referenced by the CAPTCHA in order to compare. Comparing the metrics of one image to another (such as by using one or more pattern image signatures) is a well defined domain problem that can be solved by an attacking automated process. As such, the database of images must remain secret, and this significantly limits the practical options for deployment of such systems.
Another problem with this method lies in finding a sufficient number of categorized images to use as a source so as to make a brute force dictionary attack of image comparison more statistically difficult. The most prevalent current practice is to draw upon tagged internet images. However, tagging (i.e. categorizing) of images can be inconsistent resulting in confusing or unsolvable CAPTCHAs (for example, the term “Python” can refer to the programming language Python, a type of bicycle, a snake, a type of car security system, and more).
Such systems are not very resistant to image comparison attacks since they are based on a finite source of images, which may not be categorized or defined with enough certainty to result in a satisfactory authentication test.
A second aspect of authentication involves verifying the identity of a user, thereby allowing a server or other computer to establish permission for providing services. The most common method of this kind of authentication involves the use of a shared secret that the user must enter, and most often consists of alphanumeric text password.
Shared secret passwords are susceptible to several means of exploitation. Computers may be compromised without knowledge of the user by malicious “hackers”, usually by means of a tool called a “root kit” or “backdoor”. The tool is installed by means of a virus, Trojan horse file, or exploitation of a security flaw in the computer software of the target system. The “hacker” may then use keystroke logging software to record the passwords entered by the user, or use screen capture software to extract passwords from images of login sites. Passwords may also be discovered using automated methods such as brute force attack (trying multiple combinations of letters and numbers until successful) and dictionary attacks (using common combinations of words and phrases).
In current practice the shared secret (i.e. password) authentication is an additional step the user must contend with in addition to solving a CAPTCHA.