Recent communication architectures provide for the separation of call processing functions into call-service-related components and components relating to the transportation of the payload information (bearer control). This results in a separation of connection set-up and bearer set-up. Such a concept means for the communication traffic that there is no longer a strong link with the network topology.
In such modern communication architectures, announcement and dialogue services are provided which are functionally integrated in switching nodes or can be arranged as independent media servers in the network. In this arrangement, the media server provides a multiplicity of basic functions which act as a basis for the respective announcement and dialogue service. As examples, playing a recorded announcement possibly composed dynamically of a number of parts, voice synthesis according to a predetermined text in a desired voice, interactive dialogue support based on sound inputs (DTMF) via terminal, voice recognition, speaker recognition/verification or the recording and playing of voice and video messages etc. could be mentioned.
If the service provider is interested in maximum flexibility, short provision times and efficiency, he is supported in the definition of services by so-called service creation environment functions, the output of which is then the description of the desired call processing sequence via preferably standardized description languages such as, for example, CCXML or CSTAXML. The necessary descriptions of the announcement and dialogue components are preferably also provided in standardized form, e.g. via VoiceXML. In some business models, these descriptions can also be provided by customers of the network operator and can frequently change.
VoiceXML is an XML system for writing web pages for telephone applications. These are based on voice (hearing instructions and inputting commands by voice/DTMF). VoiceXML therefore supports the following features:                spoken inputs (synthetic voice)        outputs of audio files and streams        recognition of spoken words and sentences        recognition of dual-tone multifrequency dialing (DTMF)        recording of spoken inputs        controlling the dialogue flow        telephony control (call transfer and hanging up)        
Precursors of VoiceXML are Phone Markup Language (PML), VoxML, SpeechML, TalkML, VoiceHTML. SALT is an alternative to VoiceXML.
In general, a voice browser analyses the markup code composed in the form of a single file or a sequence of files, so-called voice pages, describing a voice dialogue, parses and interprets it and edits it for the telephone medium. For the actual input and output via the telephone, the browser must interact with the hardware and software of the media server platform for using the following resources:                calling up the voice pages/files describing the dialogue from a storage medium        calling up files referenced in the associated voice page, e.g. with voice to be output, recordings to be played, grammar information, other information characterizing and supporting the input and output or also possibly associated video information        controlling the call and associated switching processes        recognizing/recording DTMF or voice (ASR)-recognizing and verifying a speaker        outputting audio files        generating voice outputs in the desired voice from text (TTS)        
An announcement is a special form of a voice dialogue in the above sense. The currently most frequently used standard for the description code of a voice browser is VoiceXML.
During the introduction of a service into the network, these descriptions are inserted in the switching nodes, application and/or media server. This can be done a priori or when required after activation of the service. In particular, this provides the VoiceXML description to the media server platforms. In principle, processing of the VoiceXML description on a media server platform requires a browser function or functionality which reads and interprets the VoiceXML pages so that the required basic functions of the media server can be allocated to the desired service and controlled.
At present, various efficient browsers are commercially available which greatly differ, e.g. with regard to a range of functions, licensing costs and requirements for the computer platform (CPU performance, memory, maximum number of parallel activations in dependence on HW and operating system). Thus, for example, a first browser may only be suitable for announcement operation or also DTMF dialogues but can be used without license costs, a second browser can be available and used as open source code but may have the disadvantage of requiring a lot of resources and/or not correspond to the newest standard and/or offer only a low service level to the network operator, or a third browser can cause high licensing costs, at the same time providing the full performance of the standard and economic utilization of resources.
In the prior art, media servers having only a single, possibly universal VoiceXML browser are used. The problems with such commercially available products lie in the high complexity which they provide even in the case of simple applications. In consequence, optimization can only be achieved by in-house development. Finally, there are no VoiceXML standard products offered on the market available with regard to optimal costs which meet the changing requirements of different application scenarios.