Conventionally, there is a problem that when there exist, in a text inputted to a voice synthesis device, words which are acoustically similar to each other and are easily misheard, the intelligibility of the synthesized voice becomes low.
Patent reference 1 describes a technique of, when words similar to each other in pronunciation exist in a text which is a target for voice synthesis, improving its intelligibility by using a voice segment having a high degree of clarity when generating a synthesized voice of the words. However, because only the degree of clarity becomes high in this case, there is a possibility that when, for example, the noise level becomes large, the user mishears the synthesized voice.
On the other hand, patent reference 2 describes a technique of replacing a word in a text which is a target for voice synthesis with another plain expression.