Conventionally, there has been a voice synthesis technique that reads a given character string has been known. In the voice synthesis technique according to the related art, it is necessary to correctly read a given character string. However, in recent years, voice synthesis has been widely used. For example, the voice synthesis has been used when personal characters, such as robot pets or game characters, utter words. For example, there is disclosed a technique in which a robot pet with emotions controls the output of a synthetic sound according to the state of the emotions.
However, in many cases, it is considered that the voice read by voice synthesis is unnatural unlike a human voice. The reason why the voice is unnatural unlike a human voice is that the voice needs to be correctly read without any pause, in addition to a sound quality problem and an emotionless accent.
In order to solve the above-mentioned problems, for example, the following techniques have been proposed. Disclosed further is a voice synthesis device capable of easily generating a synthetic voice with a stammer. Also further disclosed is a voice synthesis device that inserts a silent portion with an appropriate length at a proper position between voice waveform data items to naturally synthesize a voice without incongruity. Further disclosed is a voice synthesis device capable of changing a word that is difficult to pronounce to a word that is easy to pronounce.
However, in known arts described above, it is necessary to further improve the voice synthesis technique in order to output a sound close to a human voice.
The invention has been made in view of the above-mentioned problems and an object of the invention is to provide a speech processing device, a speech processing method, and a computer program product for speech processing.