1. Technical Field of the Invention
The present invention relates to a technology for interconnecting a plurality of phoneme pieces to synthesize a voice, such as a speech voice or a singing voice.
2. Description of the Related Art
A voice synthesis technology of phoneme piece connection type has been proposed for interconnecting a plurality of phoneme piece data indicating a phoneme piece to synthesize a desired voice. It is preferable for a voice having a desired pitch (height of sound) to be synthesized using phoneme piece data of a phoneme piece pronounced at the pitch; however, it is actually difficult to prepare phoneme piece data with respect to all levels of pitches. For this reason, Japanese Patent Application Publication No. 2010-169889 discloses a construction in which phoneme piece data are prepared with respect to several representative pitches, and a piece of phoneme piece data of a pitch nearest a target pitch is adjusted to the target pitch to synthesize a voice. For example, on the assumption that phoneme piece data are prepared with respect to a pitch E3 and a pitch G3 as shown in FIG. 12, phoneme piece data of a pitch F3 are created by raising the pitch of the phoneme piece data of the pitch E3, and phoneme piece data of a pitch F#3 are created by lowering the pitch of the phoneme piece data of the pitch G3.
In a construction in which an original of phoneme piece data is adjusted to create new phoneme piece data of the target pitch as described in Japanese Patent Application Publication No. 2010-169889, however, a problem is caused that tones of synthesized sounds having pitches adjacent to each other are dissimilar from each other, and therefore, the synthesized sounds are unnatural. For example, a synthesized sound of pitch F3 and a synthesized sound of pitch F#3 are adjacent to each other, and it is natural that tones of the synthesized sounds should be similar to each other. However, original phoneme piece data (pitch E3) constituting a basis of the pitch F3 and original phoneme piece data (pitch G3) constituting a basis of the pitch F#3 are separately pronounced and recorded with the result that the tone of the synthesized sound of the pitch F3 and the tone of the synthesized sound of the pitch F#3 may be unnaturally dissimilar from each other. Particularly in a case in which the synthesized sound of the pitch F3 and the synthesized sound of the pitch F#3 are continuously created, an audience perceives abrupt change of the tone at a transition point of time (a point of time t0 of FIG. 12) at the interface therebetween.
Meanwhile, although the pitch of the phoneme piece data is adjusted in the above description, the same problem may be caused even in a case in which another sound characteristic, such as a sound volume, is adjusted. The present invention has been made in view of the above problems, and it is an object of the present invention to create a synthesized sound having sound characteristic such as a pitch which is different from that of the existing phoneme piece data, using the existing phoneme piece data so that the synthesized sound has a natural tone.