1. Field of the Invention
The Present Invention relates to a method and system for achieving Text to Speech. More particularly, the Present Invention is related to a method and system for achieving emotional Text to Speech.
2. Description of the Related Art
Text To Speech (TTS) refers to extracting corresponding speech units from an original corpus based on a result of rhythm modeling, adjusting and modifying a rhythm feature of the speech units by using specific speech synthesis technology, and finally synthesizing qualified speech. Currently, the synthesis level of several main speech synthesis tools have all come into practical stage.
It is well known that people can express a variety of emotion during reading, for example, during reading the sentence “Mr. Ding suffers severe paralysis since he is young, but he learns through self-study and finally wins the heart of Ms. Zhao with the help of network”, the former half of which can be read with sad emotion, while the latter half of which can be read with joy emotion. However, the traditional speech synthesis technology will not consider the emotional information accompanied in the text content, that is, when performing speech synthesis, the traditional speech synthesis technology will not consider whether the emotion expressed in the text to be processed is joy, sad or angry.
Emotional TTS has become the focus of TTS research in recent years, the problem that has to be solved in emotional TTS research is to determine emotion state and establish association relationship between emotion state and acoustical feature of speech. The existing emotional TTS technology allows an operator to specify emotion category of a sentence manually, such as manually specify that the emotion category of sentence “Mr. Ding suffers severe paralysis since he is young” is sad, and the emotion category of sentence “but he learns through self-study and finally wins the heart of Ms. Zhao with the help of network” is joy, and process the sentence with the specified emotion category during TTS.