There are many documents reporting methods for building a prosodic model of a Chinese text-to-speech system. In the prior art, methods related to using the prosodic model of a Chinese text-to-speech system can be divided into five groups of methods: the first group of methods is employed for the prosodic model, which is directly created via an existing prosodic model or an pattern recognition tool; the second group of methods is employed for speech conversion between one language and its sub-dialects; the third group of methods is employed for finding the correspondence between tones and basic syllable types of two languages; the fourth group of methods is employed for transposing a speaker with an average voice in an HMM-based speech synthesis system (HTS); and the fifth group of methods is employed for a speaking-rate controlled prosodic-information generation device and a speaking-rate dependent hierarchical prosodic model.
The first group of methods does not have enough training data and does not have a systematic framework or model to establish various dialects of Chinese.
The second kind of methods applies only in the conversion between one language and its sub-dialects, and is not applicable for the conversion among the seven dialects of Chinese.
The third group of methods does not refine the prosodic model across languages and across speakers. Therefore, the estimation of the prosody in this group of methods is still limited.
The fourth group of methods does not have a scheme with speaking-rate controlled prosodic parameters as proposed in the present invention.
The fifth group of methods can only be used for learning a single language from a single speaker.
Keeping the drawbacks of the prior art in mind, and persistently employing experiments and research, the applicant has finally conceived of a speaking-rate normalized prosodic parameter builder, a speaking-rate dependent prosodic model builder, a speaking-rate controlled prosodic-information generation device and a prosodic-information generation method able to learn different languages and mimic various speakers' speaking styles.