Trust is an asset in web-based interactions. For example, a user must trust that an entity provides sufficient mechanisms to confirm and protect her identity in order for the user to feel comfortable interacting with such entity. In particular, an entity that provides a web-resource must be able to block automated attacks that attempt to gain access to the web-resource for malicious purposes. Thus, sophisticated authentication mechanisms that can discern between a resource request from a real human being and a request generated by an automated machine are a vital tool in developing the necessary relationship of trust between an entity and a user.
CAPTCHA (“completely automated public turing test to tell computers and humans apart”) and audio CAPTCHA are two such authentication mechanisms. The goal of CAPTCHA and audio CAPTCHA is to exploit situations in which it is known that humans perform tasks better than automated machines. Thus, CAPTCHA and audio CAPTCHA preferably provide a prompt that is solvable by a human but generally unsolvable by a machine.
For example, a traditional CAPTCHA requires the resource requesting entity to read a brief item of text that serves as the authentication key. Such text is often blurred or otherwise disguised. Likewise, in audio CAPTCHA, which is suitable for visually-impaired users as well, the resource requesting entity is instructed to listen to an audio signal that includes the authentication key. The audio signal can be noisy or otherwise challenging to understand.
Both CAPTCHA and audio CAPTCHA are subject to sophisticated attacks that use artificial intelligence to estimate the authentication keys. In particular, with respect to audio CAPTCHA, the attacker can use Automated Speech Recognition (ASR) technologies to attempt to recognize a spoken authentication key.
Thus, a race exists between the audio CAPTCHA and ASR technologies. As such, designing secure and effective audio CAPTCHA requires the knowledgeable exploitation of situations where it is known that humans perform relatively well, while ASR systems do not. Therefore, systems and methods for providing an audio CAPTCHA that simulate situations in which humans have enhanced listening abilities versus ASR technology are desirable.