In recent years, a speech synthesis method in which speech segments in units of phonemes, diphones, or the like are registered in a segment dictionary, the segment dictionary is searched in accordance with input phonetic text upon producing synthetic speech, and synthetic speech corresponding to the phonetic text is produced by modifying and concatenating found speech segments to output speech has become the mainstream.
In such speech synthesis method, the quality of each speech segment itself registered in the segment dictionary is important. Therefore, if phonetic environments of speech segments are not constant or the speech segments include noise, synthetic speech produced using such speech segments includes allophone or noise even when speech synthesis is done with higher precision.