The invention relates to a method of generating speech from text and a distributed speech synthesis system for performing the method.
Interactive voice response systems generally comprise a speech recognition system and means for generating a prompt in form of a speech signal. For generating prompts, speech synthesis systems are often used (text-to-speech synthesis TTS). These systems transform text into a speech signal. To this end, the text is phonetized, suitable segments are chosen from a speech database (p.ex. diphones) and the speech signal is concatenated from the segments. If this is to be performed in an environment which allows data transmission, in particular, if one or more distant end terminals such as mobile phones are to be used, special requirements with respect to the end terminal and the transmission capacity exist.
Typically, a TTS is realized centrally on a server in a network, which server performs the task of translating text into acoustic signals. In telecommunications networks the acoustic signals are coded and then transmitted to the end terminal. Disadvantageously, the data volume to be transmitted using this approach is relatively high (p.ex.>4.8 kbit/s).
In another approach the TTS may be implemented in the end terminal. In this case only a text string needs to be transmitted. However, this approach requires a large memory in the end terminal in order to ensure a high quality of the speech signal. Furthermore, the TTS needs to be implemented in each terminal, requiring high computation power in each terminal.