In general, the term unified messaging (UM) encompasses relatively simple architectures that provide incoming facsimiles and voicemail to an email inbox of a user, as well as more complex architectures that allow for dictating a message into a telephone, e.g., a mobile telephone, and delivering the message to an intended recipient in a variety of formats, e.g., as a text email, as a facsimile, or as a voice recording. A typical UM system integrates different streams of messages (e.g. email, facsimile, voice, video, etc.), and provides access to the messages via a variety of different devices (e.g. computer systems, landline telephones, and mobile telephones). For example, UM systems may send digitized voicemail messages and facsimiles to a mail server that distributes the voicemail messages to a user as email attachments. As another example, UM systems may convert email messages to speech (i.e., perform text-to-speech conversion) and then deliver audio messages to a remote user via a landline telephone or mobile telephone. Messaging systems, such as unified messaging systems, usually provide a telephony user interface that allows a user to listen to messages via a telephone (e.g. a mobile telephone) or other device (e.g. a personal computer).
Unified messaging systems may be based on various architectures. For example, a unified messaging system may employ a session initiation protocol/voice extensible markup language (SIP/VXML) architecture. As is known, SIP is an application-layer control protocol for creating, modifying, and terminating sessions with one or more users. The sessions may include Internet telephone calls, multi-media distribution, and multi-media conferences. In general, SIP is lightweight, transport-independent, and is text-based.
As is well known, VXML is the world-wide web consortium (W3C) standard for specifying interactive voice dialogues between a human being and a computer. VXML allows voice applications to be developed and deployed in a way analogous to hyper-text markup language (HTML) for visual applications. Similar to how HTML documents are interpreted by a visual browser, VXML, documents are interpreted by a voice browser. In a typical architecture, banks of voice browsers attached to a public switched telephone network (PSTN) are deployed to allow telephone users to interact with voice applications. Today, many commercial VXML applications are deployed that process millions of telephone calls per day. VXML applications include order inquiry, package tracking, driving directions, emergency notification, wake-up, flight tracking, voice access to email, customer relationship management, prescription refilling, audio newsmagazines, voice dialing, real-estate information, national directory assistance applications, etc. VXML employs tags that instruct a voice browser to perform speech synthesis, automatic speech recognition, dialog management, and sound file playback.
Typically, hyper-text transfer protocol (HTTP) is used as the transport protocol for fetching VXML pages. Some applications may use static VXML pages, while other applications employ dynamic VXML page generation using an application server. Two related W3C standards typically used with VXML are speech synthesis markup language (SSML) and speech recognition grammar specification (SRGS). SSML is used to augment textual prompts with information on how best to render the textual prompts in synthetic speech. For example, SSML is used to indicate which speech synthesizer voice to use and when to speak louder. SRGS is used to indicate to a speech recognizer what sentence speech patterns the speech recognizer should expect to receive. Call control extensible markup language (CCXML) is a complementary W3C standard. A CCXML interpreter is used on some VXML platforms to handle initial call setup between a caller and a voice browser and to provide telephony services like call transfer and disconnect for the voice browser. A typical SIP/VXML architecture separates functions to one or more application servers and one or more media servers. In SIP/VXML architectures, the application servers function as masters and the media servers function as slaves. As noted above, messages in a messaging system may take various forms (e.g. an audio file such as a voice mail, a voice memo, or a text-based email) that are played via a text-to-speech application. Irrespective of the form the message takes, the messages may be of various lengths. In a conventional messaging system, the length of the message has not usually been considered when the messaging system is interacting with a user.
The use of the same reference symbols in different drawings indicates similar or identical items.