Modern telecommunication services offer a variety of ways to facilitate interactions between users and computers. As one example, interactive voice response (IVR) technology allows computers to process interactions with audio signals, for example, from human voices and standard telephone signaling (e.g., DTMF). IVR is typically used for automated attendants, which accept voice or telephone signaling inputs to navigate menu selections.
IVR services are typically implemented using an interactive voice specification. Voice extensible markup language (VXML) is one example of an interactive voice specification. VXML services are typically processed by interpreters in order to provide audio, verbal, and/or touch-tone interaction with users. However, VXML is not natively interpreted in hypertext markup language (HTML)/ECMAScript interpreters that are used, for example, by web browsers. Rather, VXML interpreters are typically implemented on separate hardware devices designated as media servers. Unfortunately, the number of concurrent service requests that can be handled by media servers are limited because media processing can consume considerable processing capacity and resources in real-time. Conventional VXML system architectures which use media servers are only scalable by physically adding more media servers to process more user service requests.