1. Field of the Invention
The present invention relates to a voice synthesizer that synthesizes a voice from voice segments according to a time sequence of input language information.
2. Description of Related Art
There has been proposed a voice synthesis method based on a large-volume voice database, of using, as a measure, a statistical likelihood based on an HMM (Hidden Markov Model) used for voice recognition and so on, instead of a measure which is a combination of physical parameters determined on the basis of prospective knowledge, thereby providing an advantage of having rationality and homogeneity in voice quality on the basis of a probability measure of the synthesis method based on the HMM, together with an advantage of providing high quality because of the voice synthesis method based on a large-volume voice database and aimed at implementing a high-quality and homogeneous synthesized voice (for example, refer to patent reference 1).
According to the method disclosed by patent reference 1, by using both an acoustic model showing a probability of outputting an acoustic parameter (a linear predictor coefficient, a cepstrum, etc.) series for each state transition according to phoneme, and a rhythm model showing a probability of outputting a rhythm parameter (a fundamental frequency etc.) series for each state transition according to rhythm, a voice segment cost is calculated from the acoustical likelihood of the acoustic parameter series for each state transition corresponding to each phoneme which constructs a phoneme sequence for an input text, and the prosodic likelihood of the rhythm parameter series for each state transition corresponding to each rhythm which constructs a rhythm sequence for the input text, and voice segments are selected according to the voice segment costs.