The present invention relates to a technology to synthesize a sound (an uttered sound or a singing sound) by using sound segments.
A segment connection type sound synthesis has conventionally been proposed in which a sound specified as an object of sound synthesis (hereinafter, referred to as “sound to be synthesized”) is generated by connecting a plurality of previously extracted sound waveforms. For example, according to the technology of JP-A-2007-240564, previously extracted sound waveforms (segment data) are stored in a storage device for each sound segment, and the sound waveforms corresponding to the uttered letters (for example, lyrics) of the sound to be synthesized are successively selected from the storage device and connected together to thereby generate a sound signal of the sound to be synthesized.
According to the technology of JP-A-2007-240564, when a time length longer than the sound waveform stored in the storage device is specified as the duration of the sound to be synthesized, the sound waveform is repeated (looped) to thereby generate the sound signal. Consequently, a problem occurs in that a regular change of feature (for example, a change of the amplitude or the period) with the time length of the sound waveform as one period is caused and this degrades the sound quality perceived by the listener. Although this problem is solved by securing a sufficient time length for each sound waveform to the extent that makes it unnecessary to repeat the sound waveform, an enormous storage capacity is necessary for the sound waveform over a long time to be stored.