Conventionally, as a method for generating a synthesized voice, a technique for speaker adaptation of the synthesized voice has been known. In this technique, voice synthesis is performed so that a synthesized voice may sound like a voice of a target-speaker's voice which is different from a reference voice of a system (e.g., Patent Literatures 1 and 2). As another method for generating a synthesized voice, a technique for speaking-style adaptation has been known. In this technique, when an inputted text is transformed into a voice signal, a synthesized voice having a designated speaking style is generated (e.g., Patent Documents 3 and 4).
In such speaker adaptation and speech-style adaptation, reproduction of a pitch of a voice, namely, reproduction of a fundamental frequency (F0) is important in reproducing the impression of the voice. The following methods have been known conventionally as a method for reproducing the fundamental frequency. Specifically, the methods include: a simple method in which a fundamental frequency is linearly transformed (see, for example, Non-patent Literature 1); a variation of this simple method (see, for example, Non-patent Literature 2); and a method in which linked feature vectors of spectrum and frequency are modeled by Gaussian Mixture Models (GMM). (e.g., for example, Non-patent Literature 3).