CAPTCHAs are computer generated tests which, in most circumstances, a computer system will fail to pass, and are easily solved by humans. The typical implementation is a computer generated image of characters and digits which may be distorted and contain some visual background “noise.” The user is asked to type the string being displayed on the image, the procedure assuming that humans can read these images while computers can't. These tests are meant to validate the presence of a human end-user in interactions taking place over a computer network.
On the Internet it has become a common practice to use automation tools, known as “bots”, to conduct repetitive tasks and abuse web applications. Repetitive tasks can include form submissions and repetitious page requests, and are designed to create user accounts, to log-in to accounts, to submit content on web forms, to collect data from websites and generally to abuse platforms and system resources. These activities create commercial value to those conducting them, while in many aspects they badly disrupt the systems and the businesses which they abuse. CAPTCHAS were developed to prevent these abuses, by establishing if the end-user is a human or a machine. They have become the common practice web sites use to prevent automated abuse, such as spam and more.
However, since CAPTCHAs are so common, they are targeted by spammers, companies and individuals who wish to break or bypass CAPTCHAs, so as to conduct their misdeeds. There are two known ways to break or bypass a CAPTCHA challenge. The first is by using an advanced Optical Character Recognition (OCR) system. OCRs can be programmed to identify the distorted characters which are used on certain CAPTCHAs. An automated script (“bot”) utilizing an OCR will let the OCR decipher the CAPTCHA and then fill-in the string at the web form, where the CAPTCHA value should be typed.
The second method is by relaying the CAPTCHA to a third party human solver. A third party means an entity which is not the client interacting with the web server. Commercial CAPTCHA solving companies (known as “CAPTCHA farms”) charge as little as $0.50 for solving 1,000 CAPTCHAs. When a bot comes across a CAPTCHA, it will typically get the CAPTCHA image and send it to the CAPTCHA farm (sometimes through an application programming interface), where a human solver will decipher the image, sending the result string back to the bot, to fill in and pass the test. In some cases, high traffic sites, such as software serial numbers indexes and adult content, are used to attract innocent users, who are asked to solve a CAPTCHA to get the content they were looking for. The CAPTCHA is actually relayed from a bot abusing another platform, getting helped by these users.
Typically, websites which notice automated activity breaking their CAPTCHAs will change to another variant of CAPTCHA. This will bar OCRs, at least for a while, because they depend on the visual characteristics of the CAPTCHA in order to solve it. However, this will not help against 3rd party human solvers, since they are indifferent to the CAPTCHA type: as long as a human user can solve it, they can. This also suggests why CAPTCHA farms are becoming more and more popular, despite the fact they cost money, and why OCRs are becoming less favorable for spammers.
Wishing to prevent OCR-based CAPTCHA breaking, CAPTCHA challenges have become more and more difficult. Characters are typically blurred, skewed and override each other in many cases. Ultimately, many CAPTCHAs have become too difficult even for humans. They disrupt user experience and in some cases even scare users away. These difficult CAPTCHAs, resistant to OCRs, fail to stop third party human solving (i.e., a relay) and are thus comprised and get broken, for a very low cost, by spammers.