Recently, a speech synthesis apparatus has been developed so as to convert an arbitrary character string into a phonological series and convert the phonological series into synthesized speech in accordance with a predetermined speech synthesis by rule.
However, the synthesized speech outputted from the conventional speech synthesis apparatus sounds unnatural and mechanical in comparison with natural speech sounded by human being.
For example, in a phonological series “o, X, s, e, i” of a character series “onsei”, the accuracy of a rule for controlling the duration of generating each phoneme is considered as one of the factors of the awkward-sounding result. If the accuracy is low, as appropriate duration cannot be assigned to each phoneme, the synthesized speech becomes unnatural and mechanical.