It is known in the prior art to have a markup language for written text. For example, the hypertext markup language (HTML) is used in the creation of web pages. In addition, there are markup languages that are associated with voice applications, such as Voice Extended Markup Language. VoiceXML uses the XML format for specifying interactive voice dialogues between a human and a computer. It is analogous to HTML, and brings the same advantages of web application development and deployment to voice applications that HTML brings to visual applications. Just as HTML documents are interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser. The VoiceXML dialogues are textually created. When a user accesses the voice browser and begins a dialogue, the computer accesses the VoiceXML document and the computer responds using a text-to-speech (TTS) application to provide the dialogue to the user. The speech of the user does not contain any additional commands nor does VoiceXML allow the user to insert commands for later processing of the user's spoken utterance by the computer. VoiceXML does include an audio tag <audio>. The audio tag allows the playing of an audio sound file in the voice application. This audio tag is not spoken, but is inserted in the VoiceXML dialogues.
SSML is a markup language used in speech synthesis. SSML is part of a larger set of markup specifications for voice browsers developed through the open processes of the W3C. It is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. The essential role of the markup language is to give authors of synthesizable content a standard way to control aspects of speech output such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms. As with VoiceXML, SSML is a text based markup language.