Many voice over Internet protocol (VoIP) applications are implemented using a combination of session initiated protocol (SIP) and voice eXtensible Markup Language (VXML). VXML is a mark-up based programming language defined by the World Wide Web Consortium (W3C), and designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and dual-tone multiple frequency (DTMF) key input, recording of spoken input, telephony, and/or mixed initiative conversations. One of the goals of VXML is to bring the advantages of Web-based development and/or content delivery to interactive voice applications. A common SIP/VXML architecture for implementing VoIP applications utilizes application servers (that implement VXML servers) and media servers (that implement VXML clients). A media server (e.g., a VXML client) requests and obtains VXML content and/or data from an application server (e.g., a VXML server) using a hyper-text transfer protocol (HTTP) communication session. The media server then executes the obtained VXML content and/or data including, in some instances, requesting, obtaining and/or executing additional VXML content and/or data.