(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of speech enabled computing and more particularly to a system and method for real time transmission of speech audio asynchronously received from a text-to-speech engine in a computer communications network.
2. Description of the Related Art
Text-to-speech (TTS) engines are well-known in the art. Typically, a TTS engine can be used to convert computer recognizable text to audio which can be transmitted to an external audio device for ultimate audible presentation to a listener. Specifically, TTS technology permits users to audibly play back documents and provides applications with the ability to read information to the user. Whether running on a desktop computer, a telephony network, over the Internet, or in an automobile, the increased functionality of TTS-enabled applications can provide users with information access anytime, anywhere with almost any device.
In the telephony environment, TTS technology can convert text to speech, reducing the need for prerecorded interactive voice response (IVR) messages and providing users with the ability to access textual information over a telephone. The advent of Voice over IP (VolP) technology has facilitated the development of enabled applications over networks. This network convergence has opened the door to TTS-novel applications, for example voice browsing of Web sites over the Internet.
In order to transmit audio data over a computer communications network, a media transport protocol typically is employed. Presently, the Real Time Transport Protocol (RTP) is a preferred protocol for transporting real time media over a computer communications network. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP is described in detail in Schulzrinne, Casner, Frederick and Jacobson, RFC1889, RTP: A Transport Protocol for Real-Time Applications published by Internet Engineering Task Force (IETF) in January 1996 and incorporated herein by reference.
Notwithstanding, the output of a TTS engine is not ideal for real time transmission using RTP. For example, while a VoIP telephony gateway can require speech audio to arrive in the telephony gateway in a synchronized fashion in a specific format according to an underlying media protocol, the output of a TTS engine can take the form of chunks of speech audio that asynchronously can be provided at random time intervals by the TTS engine. Moreover, the chunks of speech audio can have a varying size. Finally, the format of data received from a TTS engine can vary from application to application. Accordingly, what is needed is a system and method for real time transmission of speech audio asynchronously received from a TTS engine in a computer communications network.
The present invention is system and method for real time transmission of speech audio asynchronously received from a text-to-speech (TTS) engine in a computer communications network. A system for real time transmission of speech audio asynchronously received from a TTS in a computer communications network can include a TTS engine for producing speech audio for transmission in the computer communications network; and, a real time speech audio producer for receiving the speech audio and for producing formatted audio packets for transmission over the network according to a transmission interval.
Notably, the transmission interval can fixed or variable and can be determined according to a packetization delay parameter. In addition, the real time speech audio producer can implement a thread for execution in a multi-threaded application. Finally, the system can further include a telephony gateway server communicatively linked to the real time speech audio producer. As such, the telephony gateway server can receive the produced formatted audio packets transmitted according to the transmission interval.
In a representative embodiment of the present invention, the real time speech audio producer can include a TTS audio receiver for receiving the produced speech audio from the TTS engine; an audio data compressor for compressing the received speech audio into an audio buffer; a speech audio packet formatter for formatting speech audio in the audio buffer into formatted audio packets suitable for transmission over the network; and, a transmission queue for queuing the formatted audio packets for transmission over the network. The real time speech audio producer can also include a silence detector for detecting transmission intervals in which no speech audio data from the TTS engine is available for transmission across the network; and, a silence packet generator for producing formatted silence packets in lieu of the uniformly formatted audio packets responsive to detecting the intervals in which no speech audio data from the TTS engine is available for transmission across the network.
A method for real time transmission of speech audio received from a TTS engine in a computer communications network can include receiving speech audio from the TTS engine; formatting the received speech audio into formatted audio packets suitable for transmission to an audio output device over the computer communications network; and, transmitting the formatted audio packets to the audio output device over the computer communications network according to a transmission interval. The method can further include detecting transmission intervals in which no speech audio data from the TTS engine is available for transmission across the network; and, formatting silence packets and transmitting the silence packets in lieu of the audio packets responsive to detecting the transmission intervals in which no speech audio data from the TTS engine is available for transmission across the network.
In a representative embodiment of the method of the invention, the method can also include compressing the speech audio into an audio buffer from which the audio packets can be formatted in the formatting step. In another representative embodiment, the method can further include queuing the formatted audio packets for transmission to the audio output device over the computer communications network according to the fixed transmission interval. In yet another representative embodiment, the method can further include queuing the formatted audio packets and the formatted silence packets for transmission to the audio output device over the computer communications network according to the transmission interval.
Notably, the step of transmitting the formatted audio packets to the audio output device over the computer communications network according to a transmission interval can include transmitting the formatted audio packets to a telephony gateway server over the computer communications network according to a transmission interval. Moreover, the method can also include determining the transmission interval according to a packetization delay parameter.
Advantageously, the method can be implemented in a multi-threaded application as a producer in a producer/consumer model for providing digitized speech audio over the network. In that instance, the method can include implementing the formatting and transmitting steps in a thread for execution in the multi-threaded application. Additionally, the method can include implementing the formatting audio packets step, the transmitting the audio packets step, the detecting step, and the formatting and transmitting the silence packets step in a thread for execution in a multi-threaded application. Finally, the method can include implementing the compressing step in the thread and the queuing step in the thread.