Interactive voice response services are an increasingly common part of everyday life. Services of this type are used to provide everything from banking and credit card information to interactive driving systems. Interactive voice response services are also an increasingly popular way to access the World Wide Web. This is true in spite of the growing popularity of personal data assistants and web enabled cellular telephones. The nearly ubiquitous availability of telephones, and the ability to use voice in non-traditional environments (such as when driving) ensures that the popularity and diversity of interactive voice response services will continue to grow.
Voice eXtensible Markup Language (VoiceXML) is a response to the increasing use of interactive voice response services. VoiceXML is a language for scripting interactive voice response services.
As an example, consider the following VoiceXML fragment:
<vxml> <form>  <block>   <prompt>    Hello, World!   </prompt>  </block> </form></vxml>
When processed by a VoiceXML interpreter, the prompt portion of the script plays text-to-speech (TTS) “Hello, World!”
VoiceXML fragments of this type have proven to be a flexible mechanism for accomplishing many tasks. At the same time, there are important cases where VoiceXML is lacking in required flexibility. Consider, for example, the case of an interactive voice response email service. Designers of this type of service might wish to generate a prompt that welcomes each user by name and tells them how many emails they have received since their last visit (e.g., “Hello Mr. Smith, you have ten new emails.”). Unfortunately, this type of prompt requires dynamic generation—it includes fields that must be changed to match each user and each number of new emails.
In fact, the non-dynamic nature of VoiceXML contributes to a range of implementation difficulties. These difficulties become more severe (in most cases) in more complex VoiceXML applications. As a result, there is a need for systems that include dynamic content in VoiceXML and similar languages. This need is particularly important for complex interactive voice response services.