Voice servers are used in a variety of voice processing applications. For example, IBM Corp. (Armonk, N.Y.) offers the WebSphere® Voice Server (WVS), which includes both Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) software used for deploying conversational solutions for organizations. Further details regarding this product are available at www-306.ibm.com/software/pervasive/voice_server. As another example, Telisma (Paris, France) offers networked speech recognition software called teliSpeech. Details regarding this product are available at www.telisma.com/overviewtelispeech.html.
Communication protocols supporting the control of network elements that perform ASR, speaker identification and/or verification (SI/SV), and TTS functions are defined, for example, by Oran in “Requirements for Distributed Control of ASR, SI/SV and TTS Resources,” published as an Internet Draft by the Internet Engineering Task Force (draft-ietf-speechsc-reqts-07), May 2005. This Internet draft is available at www.ietf.org/internet-drafts/draft-ietf-speechsc-reqts-07.txt. The draft defines a Speech Services Control (SPEECHSC) framework that supports the distributed control of speech resources.
One of the control protocols implementing the SPEECHSC framework is the Media Resource Control Protocol (MRCP), which is described by Shanmugham in “Media Resource Control Protocol Version 2 (MRCPv2),” published as IETF Internet draft draft-ietf-speechsc-mrcpv2-08, October 2005. This draft is available at www.ietf.org/internet-drafts/draft-ietf-speechsc-mrcpv2-08.txt.
Whereas MRCP is a control protocol, in some applications the voice data itself is transmitted using the real-time transport protocol (RTP). RTP is described in detail by Schulzrinne et al. in “A Transport Protocol for Real-Time Applications,” published as IETF Request for Comments (RFC) 3550, July 2003. This RFC is available at www.ietf.org/rfc/rfc3550.txt