The present invention relates to the field of synthesizing speech and in particular to a method and to an apparatus in which the speech synthesis function is distributed between two components that communicate with one another over a transmission facility. The invention finds practical applications in digital cellular telephony where it is desirable to deliver messages to the subscriber as spoken announcements. The invention also extends to a digital cellular terminal featuring speech synthesis capability.
Currently, the short message service (SMS) feature is used to send short text-based messages to the digital cellular terminal to display various messages, including the number/name of the called party, message headers of pending messages, and various alert messages. In a situation where the user""s eyes are busy, for example, while driving or while performing tasks requiring visual attention, the user cannot read these messages. Spoken announcements synthesized at the terminal would alleviate this problem.
One possibility to overcome this problem is to perform complete speech synthesis at the terminal. This could be achieved by transmitting the text of the message to the terminal where a speech synthesis unit will generate audible speech. However, this possibility is not practical because the terminal would necessitate a significant amount of computing power and memory, which are not available at terminals found today in the marketplace.
Against this background it clearly appears that there is a need to improve the existing functionality of remote terminals as it relates to their ability to synthesize speech. In particular, a need exists in the industry to develop remote terminals having the ability to deliver speech synthesis functions to the user without necessitating significant increases in computing and memory resources.
In one aspect, the present invention provides a communication terminal that is capable of establishing a communication session with a remote entity over a transmission facility implementing two different channels. The first channel, so-called voice channel is used for transporting a speech signal between the terminal and the remote entity. The second channel, so-called data channel is used for sending short messages to the terminal. The voice channel is characterized by a larger bandwidth capacity than the data channel.
The terminal includes a speech synthesizer engine that receives vocal tract information sent from the remote entity over the data channel such as to create a spoken announcement of the message.
Under one specific example of implementation, the communication terminal is a digital cellular terminal that can be either mobile or fixed. Under this example, the remote entity is the base station of the network with which the terminal communicates. When a message is to be delivered to the terminal as a spoken announcement, the base station processes the text of the message to generate the vocal tract information and transmits the vocal tract information over the data channel. This approach transfers at least some of the processing to effect the speech synthesis to the base station that has significant computing resources. The vocal tract information that is sent to the terminal over the data channel is a low bit rate signal that can be converted into a spoken announcement with minimal computation. This function can be implemented easily at a terminal.
In a different aspect, the invention also provides a base station for use in a digital cellular network. The base station can establish a communication session with a remote terminal over a transmission facility featuring a data channel and a voice channel. The base station has an input for receiving a textual message that is to be delivered as a spoken announcements at the remote terminal. The base station has a speech synthesis pre-processing unit that receives the signal conveying the textual message to generate the corresponding vocal tract information. The base station transmits of the vocal tract information to the remote terminal over the data channel. The remote terminal can then use the vocal tract information to generate the audio representation of the textual message.
The invention also extends to a communication terminal that can notify the remote entity with which it establishes a communication, of the kind a resident speech synthesizer it is currently implementing. Under one possible form of implementation, the notification sent to the remote entity causes the remote entity to upload to the terminal an updated version or a different version of the speech synthesizer engine in the case where the resident speech synthesizer engine is not suitable for the operation or needs updating or changing. A different possibility is to provide at the base station a pool of speech synthesis pre-processing units and to use the notification issued by the remote terminal to select the particular speech synthesis pre-processing unit that matches the speech synthesizer engine implemented at the terminal.
The invention also extends to a method for performing speech synthesis at a communication terminal and also to a method for performing speech synthesis pre-processing at a base station of a digital cellular network. The invention also provides a method for upgrading or altering a speech synthesizer engine in a communication terminal through interaction with a remote entity with which the terminal communicates. Finally, the invention relates to a method for performing a speech synthesis pre-processing unit selection at a base station of the digital network based on the type of the speech synthesizer engine residing at the remote terminal with which the base station communicates.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.