A typical business interaction between a user and a business agent involves the agent talking to the user, asking questions, entering responses into a computer, and reading information to the user from a terminal screen. Such an interaction can be used to place a catalogue order; check an airline schedule; query a price; review an account balance; notify a customer; record and retrieve a message. For logical processes this can be automated by substituting the agent for an interactive voice response system (IVR) with an ability to play voice prompts and receive user input by speech recognition or from DTMF tones.
An interactive voice response system is typically implemented using a client server configuration where the telephony interface and voice application run on the client machine and voice data supply server software such as text-to-speech or a voice prompt database runs on a server with a local area network connecting the two machines. When the voice application requires voice data it requests a voice server to start streaming the voice data to the client. The client waits until a certain amount of voice data has been accumulated in a buffer and then plays voice data on an open telephony channel.
Voice applications used in an IVR can be written in VoiceXML markup language. VoiceXML is industry standard in the telephony market and grew from extensible markup language (XML). Through the use of customised tags VoiceXML offers greater flexibility in organising and presenting information than is possible with other mark up coding systems. VoiceXML defines a new set of XML ‘tags’ which can be used to write voice response applications and it simplifies speech application development by using familiar web infrastructure, including web pages, web tools and web servers.
Voice applications in the form of web pages are fetched and interpreted by a VoiceXML enabled browser which invokes the actions defined in the web page by the VoiceXML tags, e.g. play prompt; get DTMF; do voice recognition; play text-to-speech string etc. This allows people to embed VoiceXML tags in their existing HTML pages and effectively have a single source for both text and telephony based interaction with a server side application. The pages are simply served up to an IVR from a standard web server using the HTTP protocol in the same way as HTML pages would be. VoiceXML components such as a voice prompts are embedded in the VoiceXML application.
In a typical interactive voice system, a cache of a VoiceXML source code comprises a hash table of Universal Resource Indicator (URI) keys and associated filename entries. These entries are references to local files that exist in a known directory on the local machine. Each time a request is made to the cache for a document, the local file is loaded, its input stream is read, objects are created for each and every element in the document and processed in a hierarchical fashion.
For voice systems that use VoiceXML documents to describe dialogues with callers and support multiple telephone channels, there is a need to be able to efficiently cache VoiceXML documents for reuse across calls and across channels. Ordinarily a VoiceXML browser reads a raw input stream from a file and a VoiceXML parser generates a complete in-memory tree representation of the VoiceXML document. A schematic representation of the initial steps involved in prior art Document Object Model (DOM) creation follows:
Initial: DOCUMENT -read->INPUT STREAM -parse->DOM
Therefore current implementations store the source form of the VoiceXML document and require the VoiceXML interpreter to re-parse the document before use on each and every call. The problem with this process is that it is slow and is repeated every time a document is loaded regardless of whether it has been previously loaded.