Many enterprises employ an interactive voice response (IVR) system that handles calls from telecommunications terminals. An interactive voice response system typically presents a hierarchy of menus to the caller, receives input keyed in by the caller, receives input spoken by the caller, and performs automated speech recognition (ASR) to interpret the spoken input. For example, a caller to an interactive voice response system might be presented with a menu of five options and might select the third option by pressing the “3” key of the terminal's keypad, or by saying the word “three.” Similarly, a caller might specify his or her bank account number to the interactive voice response system by inputting the digits via the keypad, or by saying the digits. An interactive voice response system is also typically capable of issuing commands in response to a caller's input (e.g., retrieving a customer's records from a database, invoking a software application to process a bank account transaction, etc.), and of connecting the caller to a person in the enterprise.
FIG. 1 depicts telecommunications system 100 in accordance with the prior art. Telecommunications system 100 comprises telecommunications network 105, private branch exchange (PBX) 110, and interactive voice response system 120, interconnected as shown.
Telecommunications network 105 is a network such as the Public Switched Telephone Network [PSTN], the Internet, etc. that carries a call from a telecommunications terminal (e.g., a telephone, a personal digital assistant [PDA], etc.) to private branch exchange 110. A call might be a conventional voice telephone call, a text-based instant messaging (IM) session, a Voice over Internet Protocol (VOIP) call, etc.
Private branch exchange (PBX) 110 receives incoming calls from telecommunications network 105 and directs the calls to interactive voice response (IVR) system 120 or to one of a plurality of telecommunications terminals within the enterprise, depending on how private branch exchange 110 is programmed or configured. For example, in an enterprise call center, private branch exchange 110 might comprise logic for routing calls to service agents' terminals based on criteria such as how busy various service agents have been in a recent time interval, the telephone number called, and so forth. In addition, private branch exchange 110 might be programmed or configured so that an incoming call is initially routed to interactive voice response (IVR) system 120, and, based on caller input to IVR system 120, subsequently redirected back to PBX 110 for routing to an appropriate telecommunications terminal within the enterprise. Private branch exchange (PBX) 110 also receives outbound signals from telecommunications terminals within the enterprise and from interactive voice response (IVR) system 120, and transmits the signals on to telecommunications network 105 for delivery to a caller's terminal.
Interactive voice response (IVR) system 120 is a data-processing system that presents one or more menus to a caller and receives caller input (e.g., speech signals, keypad input, etc.), as described above, via private branch exchange 110. Interactive voice response system (IVR) 120 is typically programmable and performs its tasks by executing one or more instances of an IVR system application. An IVR system application typically comprises one or more scripts that specify what speech is generated by interactive voice response system 120, what input to collect from the caller, and what actions to take in response to caller input. For example, an IVR system application might comprise a top-level script that presents a main menu to the caller, and additional scripts that correspond to each of the menu options (e.g., a script for reviewing bank account balances, a script for making a transfer of funds between accounts, etc.).
A popular language for such scripts is the Voice extensible Markup Language (abbreviated VoiceXML or VXML). The Voice extensible Markup Language is an application of the extensible Markup Language, abbreviated XML, which enables the creation of customized tags for defining, transmitting, validating, and interpretation of data between two applications, organizations, etc. The Voice extensible Markup Language enables dialogs that feature synthesized speech, digitized audio, recognition of spoken and keyed input, recording of spoken input, and telephony. A primary objective of VXML is to bring the advantages of web-based development and content delivery to interactive voice response system applications.
FIG. 2 depicts an exemplary Voice extensible Markup Language (VXML) script (also known as a VXML document or page), in accordance with the prior art. The VXML script, when executed by interactive voice response system 120, presents a menu with three options; the first option is for transferring the call to the sales department, the second option is for transferring the call to the marketing department, and the third option is for transferring the call to the customer support department. Audio content (in particular, synthesized speech) that corresponds to text between the <prompt> and </prompt> tags is generated by interactive voice response system 120 and transmitted to the caller.