Interactive voice response (IVR) systems allow a computer to detect and process the speech or touch tones entered by a caller. The IVR system can respond with pre-recorded or dynamically generated messages to further direct the caller. IVR systems are often employed when the caller interface can be presented as a number of menu choices. The collection of menu choices associated with an IVR system is often referred to as an IVR tree.
In practice, a caller typically calls a desired telephone number that is answered by an IVR system. The IVR system plays a message and prompts the caller to select an option from a menu of options. The caller can typically press a number associated with a desired menu option on a telephone keypad or state the selected number. For example, the pre-recorded message may prompt the user to “say or press 1 for yes, or say or press 2 for no.” Speech recognition is typically employed to interpret the caller's spoken answers in response to the voice prompt.
In such an IVR environment, mechanical agents (or “robots”) often attempt to place undesired telephone calls to the IVR system. The robots typically aim to traverse the IVR menu to reach a human agent (and thereby waste a valuable resource), or to reach another limited resources, such as a bank account or other stored data. In an IVR or another telephony domain, spam (i.e., unsolicited or undesired bulk electronic messages) is often referred to as “Spam over Internet Telephony” (“SPIT”) and is a problem for both traditional and Voice Over Internet Protocol (VoIP) telephony services. The undesired telephone calls can include, for example, advertising or political messages, interruptions (sometimes referred to as “ring and run”), or denial of service (DoS) attacks. Denial of Service attacks, for example, can overload voice servers and affect system reliability. Robot attacks against telephones could be directed at IVR systems or against humans in teal time or via voice mail or facsimile. In addition, robot attacks can be directed at other limited resources, such as bank accounts or other stored data.
A number of techniques exist for distinguishing between human and computer users, often referred to as “Completely Automated Public Test to tell Computers and Humans Apart,” or “CAPTCHA.” CAPTCHAs are commonly used on web sites such as those selling event tickets or offering flee e-mail services. An image file that contains a degraded picture of a word is typically displayed, and the user must type in the characters in the image. Such images are generally tuned to be beyond the capability of mechanical optical character recognition (OCR) systems, but within the capability of most human users.
In the telephone domain, Telephone CAPTCHAs (or TCAPTCHAs) have been used to present a user with an audio message (typically a sequence of digits) that has been degraded beyond the capability of speech recognition systems. The caller must enter (or speak) the digit sequence to establish that he or she is a human user. Generally, robots do not have sufficient speech recognition capabilities and will thus fail the tests. In this manner, robots will waste time in an IVR system (and thereby be discouraged from attacking the protected system), while human users will navigate them easily to their desired tasks. The degradation is accomplished, for example, by techniques that add background noise, such as white noise; or other degradations, such as echoes or the simulation of packet loss. The resulting sounds are difficult for machines to recognize, yet are typically easily recognized by human users. The resulting sounds, however, are typically unnatural and potentially irritating to human users. These tests are typically applied at a portal before a user is given access to a system.
A need therefore exists for improved techniques for defending against telephone-based robotic attacks