Voice synthesis is known that converts any input text to a voice and outputs the voice. Voice synthesis requires a voice model representing prosody and phonemes of the voice. A voice synthesis technique using the hidden Markov model is known as a technique for statistically creating the voice model.
In the voice synthesis using the hidden Markov model, a hidden Markov model is trained using a parameter representing a prosody parameter, a voice spectrum, and others extracted from a voice waveform of a target speaker and context representing a language attribute such as a phoneme and grammar. This process can generate a synthesized voice in which vocal sound and a voice of a target speaker are reproduced. Furthermore, in the voice synthesis based on the hidden Markov model, parameters relating to a voice are modeled, which allows various types of processing to be done in more flexible manner. For example, a voice model for a target voice of a speaker can be created with the speaker adaptation technique using an existing voice model and a small amount of voice data representing the target voice of the speaker.