1. Technical Field
This invention relates to the field of interactive voice response systems, and more particularly to a method and system for collecting audio prompts and replacing such prompts.
2. Description of the Related Art
With more and more Web applications being voice enabled, the tooling used in association with these applications significantly increases in importance to make it easier and more efficient for developers to code such applications. There are generally two aspects to these applications: (1) prompting a user and (2) a user's response.
With respect to prompting a user, either professionally recorded audio or audio generated by text to speech engines is typically used for play back of such prompts. With respect to a user's response, a Speech Recognition engine is used to capture the user's response and pass back the results to VoiceXML applications.
When developers voice-enable web applications, they tend to insert prompts wherever they can, especially when the data is generated dynamically (in other words, the placement of prompts might not be highly predictable—e.g., not necessarily enclosed in <audio> tags). To increase customer satisfaction and corporate brand image, many companies insist on the use of professionally recorded audio instead of Text to Speech (TTS) generated audio. The default behavior for VoiceXML thus plays stored audio for each prompt if available and properly formatted in <audio> tags, and otherwise synthesizes speech using a TTS engine. Such a scenario creates a challenging environment to capture all the prompts that are generated by TTS engines. A burdensome option would require going through each line of code and isolating the prompts, which becomes very difficult, if not impossible, when prompts are dynamically generated. Another option is for a developer to deploy the application with audio files and listen to every path to manually identify the TTS generated audio for the purpose of replacing TTS with professional recordings.