1. Field of the Invention
The present invention relates generally to speech recognition and text-to-speech (TTS) synthesis technology in telecommunication systems. More particularly, the present invention relates to predicting tone pattern information for textual information used in telecommunication systems.
2. Description of the Related Art
This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the claims in this application and is not admitted to be prior art by inclusion in this section.
Voice can be used for input and output with mobile communication terminals. For example, speech recognition and text-to-speech (TTS) synthesis technology utilize voice for input and output with mobile terminals. Such technologies are particularly useful for disabled persons or when the mobile terminal user cannot easily use his or her hands. These technologies can also give vocal feedback such that the user does not have to look at the device.
Tone is crucial for Chinese (e.g., Mandarin, Cantonese, and other dialects) and other languages. Tone is mainly characterized by the shape of its fundamental frequency (F0) contour. For example, as illustrated in FIG. 1, Mandarin tones 1, 2, 3, and 4 can be described as: high level, high-rising, low-dipping and high-falling, respectively. The neutral tone (tone 0) does not has specific F0 contour, and is highly dependent on the preceding tone and usually perceived to be temporally short.
Text-to-speech in tonal languages like Chinese are challenging because usually there is no tonal information available in the textual representation. Still, tonal information is crucial for understanding. Tone combinations of neighboring syllables can form certain tone patterns. Further, tone can significantly affect speech perception. For example, tone information is crucial to Chinese speech output. In English, an incorrect inflection of a sentence can render the sentence difficult to understand. In Chinese, an incorrect intonation of a single word can completely change it's meaning.
In many cases, tone information of syllables is not available. For example, Chinese phone users can have names in a phone directory (“contact names”) in PINYIN format. PINYIN is a system for transliterating Chinese ideograms into the Roman alphabet, officially adopted by the People's Republic of China in 1979. The PINYIN format used for the contact name may not include tonal information. It can be impossible to get tone information directly from the contact name itself. Without tone or with the incorrect tone, generated speech from text is in poor quality and can completely change the meaning of the text.
U.S. patent application 2002/0152067, which is assigned to the same assignee as the present application, discloses a method where the pronunciation model for a name or a word can be obtained from a server residing in the network. However, this patent application only describes a solution involving pronunciation. Use of tonal information is not included or suggested. As indicated above, significant meanings can be lost without tonal information.
International patent application WO 3065349 discloses adding tonal information to text-to-speech generation to improve understandability of the speech. The technique described by this patent application utilizes an analysis of the context of the sentence. Tone is identified based on the context of other in which the word is located. However, such context is not always available, particularly with communication systems such as mobile phones, nor does context always provide the clues needed to generate tonal information.
Thus, there is a need to predict tone patterns for a sequence of syllables without depending on the context. Further, there is a need to predict tone patterns to properly identify names used as contacts for a mobile device. Even further, there is a need to synthesize contact names in communication terminals when tone information is not available. Still further, there is a need to generate tonal information from text for languages like Chinese where tonal information is vital for communication and comprehension.