1. Field of the Invention
The present invention relates to a voice synthesizer, a voice synthesizing method, and a computer program. More particularly, the present invention relates to a voice synthesizer, a voice synthesizing method and a computer program that use recorded voices that are pre-recorded to generate a synthesized voice that reads out a text.
2. Description of the Related Art
Voice synthesizers are known that use a pre-recorded human natural voice to convert a text document that is input into a personal computer (PC) or the like in to a voice that reads out the text document. This type of voice synthesizer synthesis a voice based on a voice corpus including recorded natural voices that can be split into parts of speech.
In this voice synthesizer, first, for example, morphological analysis and modification analysis are performed on the input text in order to convert the text in to phonemic symbols, accent symbols and the like. Next, the phonemic symbols, an accent symbol string, and part of speech information for the input text obtained from the modification analysis results are used to estimate prosody parameters such as phoneme duration (voice length), fundamental frequency (voice pitch), power of the vowel center (voice strength) and the like. Then, dynamic programming is used to select the combination of synthesis units that have the smallest possible distortion when the synthesis units (phonemes) that are closest to the estimated prosody parameter and that are stored in the waveform dictionary are connected.
The prosody parameters are related to the intonation, accent, and the like of the synthesized voice when it reads out a text. With known voice synthesizers, since the voice is synthesized based on the prosody parameters estimated from the analysis results of the text as described above, it is difficult to generate a synthesized voice that has an intonation, accent, and the like that satisfies the user's expectations. To address this difficulty, in order to generate a synthesized voice having an intonation, accent, and the like that satisfy the user's expectations, a device has been proposed that synthesizes a voice based on prosody parameters that have been specified by the user using a graphical user interface (GUI).
For an example of such art refer to “A Corpus-based Speech Synthesis System”, in The Institute of Electronics, Information and Communication Engineers, Technical Report, SP2005-18, p. 37-42 (2005, 5).
However, with the above art, there are many occasions when it difficult for a general user to understand which prosody parameters should be set to which values in order to generate a desired intonation. Thus, with a device like that above in which the prosody parameters are specified, it is difficult for a general user to generate a synthesized voice that has an intonation and the like that satisfies the user's expectations.