With the continual improvements being made in computerized information networks, there is ever-increasing need for devices capable of retrieving information from the networks in response to a user's request(s). Devices that allow the user to enter requests using voice commands and to receive the information in audio format, are becoming increasingly popular. These devices are especially popular for use in a variety of situations where entering commands via a keyboard is not practical.
As technologies including telephony, media, text-to-speech (TTS), and speech recognition undergo continued development, it is desirable to periodically update the devices with the latest capabilities.
It is also desirable to provide a modular architecture that can incorporate components from a variety of vendors, and can operate without requiring knowledge of, or changes to, the application for which the device is utilized.
It is further desirable to provide a system that can be scaled up to tens of thousands of simultaneously active telephony sessions. This includes independently scaling the telephony, media, text to speech and speech recognition resources as needed.