The present invention relates to speech synthesis. In particular, the present invention relates to a multi-lingual speech synthesis system.
Text-to-speech systems have been developed to allow computerized systems to communicate with users through synthesized speech. Some applications include spoken dialog systems, call center services, voice-enabled web and e-mail services, to name a few. Although text-to-speech systems have improved over the past few years, some shortcomings still exist. For instance, many text-to-speech systems are designed for only a single language. However, there are many applications that need a system that can provide speech synthesis of words from multiple languages, and in particular, speech synthesis where words from two or more languages are contained in the same sentence.
Systems, that have been developed to provide speech synthesis for utterances having words from multiple languages, use separate text-to-speech engines to synthesize words from each respective language of the utterance, each engine generating waveforms for the synthesized words. The waveforms are then joined or otherwise outputted successively in order to synthesize the complete utterance. The main drawback of this approach is that voices coming out of the two engines usually sound different. Users are commonly annoyed when hearing such voice utterances, because it appears that two different speakers are speaking. In addition, overall sentence intonation is destroyed, which impairs comprehension.
Accordingly, a system for multi-lingual speech synthesis that addresses at least some of the foregoing disadvantages would be beneficial and improve multi-lingual speech synthesis.