The present invention relates generally to messaging systems and in particular to a text-to-speech converter and to a messaging system incorporating the same. The present invention also relates to a system to buffer data transmitted from a source to a destination over a network.
Voice messaging systems are common in today""s business community. Most business organizations or enterprises make use of a private branch exchange (PBX) to direct a caller""s telephone call to the appropriate extension of the called party. If the called party is unable to answer the telephone call, the telephone call is forwarded to a voice messaging system, which allows the caller to leave a voice message in the mailbox assigned to the called party. Messages left for called parties within the business organization can be retrieved from memory by calling the voice messaging system using a telephone and entering appropriate commands via a touch-tone keypad. Retrieved messages can be played, forwarded or deleted. An example of a voice messaging system of this nature is the Series 6 sold by Mitel Corporation of Ottawa, Ontario, Canada.
In addition to voice messages, communications within business organizations are also stored in facsimile and text formats. In the past, separate messaging systems have been used to handle these different types of communications. Unfortunately, prior art messaging systems designed to handle one type of communication have not provided any means to interact with messaging systems handling other types of communications. This has required users to access each messaging system individually to retrieve messages and has required business organizations to maintain and manage multiple messaging systems separately. As a result, it has been necessary to establish separate accounts, address lists and message mailboxes in each messaging system for the various users in the business organizations.
More recently, attempts have been made to interconnect different messaging systems to provide access to different types of messages from a single point. For example, U.S. Pat. No. 5,349,636 to Irribarren discloses a system and method for voice mail systems and interactive voice response (IVR) systems. The Irribarren system includes a voice message system and a text message system integrated via a network, which coordinates the functions of each individual message system. A user may access messages stored in the voice message system and in the text message system via a single telephone call. Although this system allows access to different types of messages, the voice message and text message systems require separate management.
The current trend is to integrate these various messaging systems to allow users to access all types of communications once a connection is made to the messaging system. To that end, unified messaging systems have been developed to provide users access to virtually all of their communications. Messaging systems of this nature store all messages for entities within the enterprise at a common location. The entities may be individuals, groups, departments, or any appropriate logical organizations. Users accessing the messaging system via a telephone, desktop computer or other communication device, have access to all of their messages regardless of message type and regardless of the type of communication device used to access the messaging system. Appropriate message translators such as text-to-speech (TTS) converters, speech-to-text (SST) converters etc. are included to enable users to retrieve messages stored in formats not supported by the communication devices used to access the messaging system.
Two types of text-to-speech converters have been commonly used in messaging systems to-date. The most common and affordable type of text-to-speech converter is central processing unit (CPU) based, and makes use of a system processor to perform the text-to-speech conversion. Digital signal processing (DSP) based text-to-speech converters are also used. Although DSP based text-to-speech converters are faster, they are significantly more expensive than their CPU based counterparts.
While CPU based text-to-speech converters are easier and more economical to implement; they suffer from a number of inherent limitations. For example, in CPU based text-to-speech converters, the text-to-speech conversion process is very CPU intensive. A large text message may consume a significant number of CPU cycles leaving very little CPU for other critical tasks. Also, the output of CPU based text-to-speech converters is stored in a file or memory. A typical electronic mail (e-mail) message converted into speech can easily consume 1 Mb of memory or more. In addition, latency in the text-to-speech conversion grows in proportion to the amount of text being converted. Thus, large text messages introduce noticeable latencies when converted and played by voice messaging systems. Furthermore, often text messages do not convert well resulting in intelligible voice messages. When this occurs, users typically skip the messages. Unfortunately, in this case significant CPU resources are wasted to convert the entire text message before the user begins to listen to the message.
Voice messaging systems incorporating CPU based text-to-speech converters, typically perform the text-to-speech conversion in the background thereby reducing the delay before the text messages are read. However, lack of throttling places heavy loading on the CPU. Also, if enhancements are added to the text-to-speech converters of this nature, CPU usage increases even further. Thus, the text-to-speech converters are only able to perform a small number of simultaneous operations. Accordingly, improvements to CPU based text-to-speech converters are desired.
Thus, there is a need for an improved text-to-speech converter and messaging system incorporating the same. There is also a need for a novel system to buffer data transmitted from a source to a destination over limited bandwidth networks.
According to one aspect of the present invention there is provided in a text-to-speech converter that includes: a text-to-speech engine receiving source text and converting the source text into speech data; a read mechanism reading speech data from the text-to-speech engine and writing the speech data to a buffer; and a throttle mechanism reading speech data from the buffer and conveying the speech data to a playback operation, the throttle mechanism triggering the read mechanism to read data from the text-to-speech engine and write the speech data to the buffer so that unread speech data in the buffer remains ahead of speech data read by the throttle mechanism by at least a predetermined amount.
In the preferred embodiment, it is preferred that the predetermined amount is programmable. Preferably, the read mechanism pre-fills the buffer during initialization to include a predetermined amount of unread speech data before the throttle mechanism is permitted to read speech data from the buffer. It is also preferred that the buffer is unavailable to the throttle mechanism during writing of speech data to the buffer. Furthermore, it is preferred that after the throttle mechanism reads speech data from the buffer, the throttle mechanism examines the buffer to determine if unread speech data remaining in the buffer is at least equal to the predetermined amount. The throttle mechanism triggers the read mechanism if unread speech data in the buffer is less than the predetermined amount. It is also preferred that during reading of data from the buffer by the throttle mechanism, the buffer is unavailable to the read mechanism for writing.
In one embodiment, the read mechanism is a background thread invoked by a central processing unit. The background thread is responsive to read events generated by the throttle mechanism when unread speech data in the buffer falls below the predetermined amount.
According to another aspect of the present invention there is provided a text-to-speech converter that includes: a text-to-speech engine receiving source text and converting the source text into speech data; a read mechanism reading speech data from the text-to-speech engine and writing the speech data to a buffer; and a throttle mechanism reading speech data from the buffer and conveying the speech data to a playback operation, the throttle mechanism triggering the read mechanism to read data from the text-to-speech engine and write the speech data to the buffer so that at least a predetermined amount of unread speech data remains in the buffer.
According to still yet another aspect of the present invention there is provided a system to buffer data transmitted from a source to a destination over a network interconnecting the source and destination, the system includes: a read mechanism reading video and/or audio data from the source and writing the data to a buffer; and a throttle mechanism reading data from the buffer and conveying the data to the destination via the network, the throttle mechanism triggering the read mechanism to read data from the source and write the data to the buffer so that at least a predetermined amount of unread data remains in the buffer.
According to still yet another aspect of the present invention there is provided a method of converting a text message to a speech message that includes the operations of: receiving a text message and converting the text message into speech data; reading quantities of the speech data and writing.the quantities of speech data to a buffer; and reading speech data from the buffer and conveying the speech data to a playback operation for voice playback of the text message, the speech data being written into the buffer so that at least a predetermined amount of unread speech data remains in the buffer while the speech data is being read from the buffer.
The present invention provides advantages in that since intelligent buffering is used to manage the text-to-speech conversion, CPU usage is reduced and memory usage is more efficient as compared to conventional CPU based text-to-speech converters. Since CPU usage is reduced, a large number of simultaneous text-to-speech conversion operations can be performed without requiring additional resources.
The present invention also provides advantages in that since converted text is played back at a slow rate, the converted text being placed in the buffer, need only stay ahead of the converted text being read from the buffer for voice playback, by a nominal amount to avoid gaps in playback. As a result, the amount of up-front processing that is required before message playback commences is reduced. Furthermore, the slow rate of message playback allows for greater flexibility in terms of background processing. Features such as e-mail scrubbing to make text messages more readable can be added without causing any significant overhead or latency issues.