Interactive voice response (IVR) systems allow handling of telephone calls from telecommunications terminals, and are typically used by enterprises for the provision of service offerings, such as customer care, help desk assistance, value added services, and so on. An IVR system typically renders a hierarchy of “spoken” menus to the caller, and prompts the caller for input to navigate within the menus and provide information. Typically, input commands are provided by the caller with the aid of the telephone keypad and propagated in the telecommunications network via Dual Tone Multi-Frequency (DTMF) signaling; sometimes, input commands could also be provided directly by voice. User inputs may correspond to menu options to navigate the IVR system and locate the desired information, or they may represent data information provided by the user to the IVR system (e.g. a bank account number, a Personal Identification Number, or PIN, etc.).
An IVR system normally hosts one or more IVR applications, comprising scripts that specify what speech has to be generated by the IVR system, what input(s) to collect from the caller, and what actions to trigger in response to caller input(s).
For example, an IVR system application might comprise a script that presents a main menu to the caller in response to a call thereof, and additional scripts that correspond to each of the menu options comprised in the main menu, or in other, hierarchically deeper menus.
The language typically used for defining scripts to be used in IVR system applications is the Voice eXtensible Markup Language (VoiceXML, or VXML). The Voice eXtensible Markup Language is an instantiation of the eXtensible Markup Language (XML) purposefully designed for specifying audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF input, recording of spoken input, telephony, and mixed initiative conversations. According to the W3C Recommendations, the major goal of VXML is to bring the advantages of Web-based development and content delivery to interactive voice response applications.
IVR system applications taking advantage of a video channel communication in addition to the traditional voice channel communication have also been recently envisioned and implemented. To fully appreciate such video enhanced applications, users will need videocommunication terminals (such as videotelephones), which are increasingly becoming popular both in fixed and mobile telecommunications networks.
US Patent Application no. US 2006/0203975 discloses an IVR system enabled to deliver content streams of various media types (e.g. video, audio, etc.) to telecommunications terminals via the addition of extensions to the VXML standard. The IVR system delivers a particular content stream to a terminal only if: (i) the terminal has a transducer (e.g. speaker, video display, etc.) that is capable of outputting the content stream's media type, and (ii) that transducer is currently enabled.
US Patent Application no. US 2006/0203976 discloses an IVR system enabled to intelligently deliver multimedia content streams (i.e. content streams that comprise two or more components that have different media types) via the addition of extensions to the VXML standard. A telecommunications terminal periodically informs an IVR system of the Quality of Service (QoS) for transmissions received at the terminal. When an IVR system script specifies a multimedia content stream to be delivered to the terminal, the IVR system determines which components of the multimedia content stream can be delivered while maintaining QoS above a minimum acceptable threshold.
US Patent Application no. 2006/0203977 discloses an IVR system that generates video content to accompany the generated audio content, where the video content is based on the state of the IVR system, the video display capacity of the calling communications terminal, and information supplied by the user via the terminal. Video content is generated based on the text from which audio content is generated. In particular, the video content comprises an abridged version of the text that is suitable for display at the telecommunications terminal. The abridged version of the text is generated via syntactic and semantic processing of the text. In addition, an abridged version of user-supplied information is generated and incorporated in the video content.
US Patent Application no. US 2006/0203978 discloses an IVR system enabled, via VXML extensions, to specify the playback order, timing, and coordination of multiple content streams (e.g. whether an audio stream and a video stream should be played back concurrently or serially; whether a particular content stream should finish before playback of another content stream commences; whether a content stream that is currently playing should be stopped and supplanted with another content stream, etc.).
EP Patent Application no. 1701528 discloses an IVR system that generates an asynchronous event when there is a content stream-related occurrence during a call (e.g. completion of playback of the content stream; user control of the content stream, drop-off in QoS for the content stream), and the asynchronous event is caught by an appropriate event handler within the IVR system script. The event handler then spawns a separate thread that handles the event accordingly and executes in parallel with the IVR system script (i.e. the IVR system script continues handling the call while the thread executes).