The present invention relates to techniques for synthesizing audio sounds, such as tones or voices.
As known in the art, it is possible to generate an aurally-natural tone by imparting a pitch variation characteristic, corresponding to pitch variation of an actually uttered human voice (hereinafter referred to as “reference tone”), to a tone to be synthesized. For example, a non-patent literature “A trainable singing voice synthesis system capable of representing personal characteristics and singing styles”, by Shinji Sako, Keijiro Saino, Yoshihiko Nankaku, Keiichi Tokuda and Tadashi Kitamura, in study report of Information Processing Society of Japan, “Music Information Science”, 2008, vol. 12, pp. 39-44, February 2008, discloses a technique for creating a probability model, representative of a time series of pitches of a reference tone, for each of various attributes (or contexts), such as pitches and lyrics and then using the created probability models for generation of synthesized tone. During the process of synthesizing a designated tone, a synthesized tone is controlled in pitch to follow a pitch trajectory identified from the probability model corresponding to the designated tone. Note that, in this specification, the term “tone” is used to collectively refer to any one of all signals of voices, sounds, tones etc. in the audible frequency range.
In fact, however, it is difficult to prepare probability models for all kinds of attributes of a designated tone. In a case where there is no probability model accurately matching an attribute of a designated tone, it is possible to create a pitch trajectory (pitch curve) using an alternative probability model close to the attribute of the designated tone in place of the probability model accurately matching the attribute of the designated tone. However, with the technique disclosed in the above-identified non-patent literature, where probability models are created through learning of numerical values of pitches of a reference tone and where learning of a pitch of a designated tone, for which an alternative probability model close to an attribute of the designated tone is used in place of a probability model accurately matching the attribute of the designated tone, is not actually executed, it is very likely that an aurally-unnatural synthesized tone would be generated.
Whereas the forgoing has described the case where a pitch trajectory is created using a probability model, an aurally-unnatural synthesized tone may also be undesirably generated in a case where numerical values of a pitch of a reference tone are stored to be subsequently used for creation of a pitch trajectory at the time of tone synthesis.