A) Field of the Invention
This invention relates to a singing voice synthesizing apparatus, a singing voice synthesizing method and a program for singing voice synthesizing for synthesizing a human singing voice.
B) Description of the Related Art
In a conventional singing voice synthesizing apparatus, data obtained from an actual human singing voice is stored as a database, and data that agrees with contents of an input performance data (a musical note, a lyrics, an expression and the like) is chosen from the database. Then, a singing voice that is close to the real human singing voice is synthesized by a data conversion of this performance data based on the chosen data.
A principle of the singing voice synthesizing is explained in Japanese Patent Application No.2001-67258, which was filed by the applicant of the present invention, with reference to FIGS. 7 and 8.
The principle of the singing voice synthesizing apparatus mentioned by Japanese Patent Application No.2001-67258 is shown in FIG. 7. This singing voice synthesizing apparatus equips a timbre template database 51 in which data for characteristic parameters of phoneme (timbre template) at one point is stored, a constant part (stationary) template database 53 in which data (the stationary template) for slight change of the characteristic parameters in a long sound is stored and a phonemic chain (articulation) template database 52 in which data (the articulation template) that change from a phoneme to a phoneme for the characteristic parameters of the transition part is shown.
The characteristic parameter is generated by applying these templates by doing as follows.
That is, synthesizing of the long sound part is executed by adding changing component included in the stationary template on the characteristic parameter obtained from the timbre template.
On the other hand, however, synthesizing of the transition part is executed by adding the changing component included in the articulation template on the characteristic parameter obtained from the timbre template, a characteristic parameter to be added with is different by cases. For example, in a case that a front and a rear phonemes of the transition part are both voiced sounds, the changing component included in the articulation template on the characteristic parameter is added on what is obtained by linear interpolation of the characteristic parameter of the front part phoneme and the characteristic parameter of the rear part phoneme. Also, in a case that the front part phoneme is a voiced sound and the rear part phoneme is a silence, the changing component included in the articulation template on the characteristic parameter is added on the characteristic parameter of the front part phoneme. Also, in a case that the front part phoneme is a silence and the rear part phoneme is a voiced sound, the changing component included in the articulation template-on the characteristic parameter is added on the characteristic parameter of the rear part phoneme. As doing as the above, in the singing voice synthesizing apparatus disclosed in Japanese Patent Application No.2001-67258, the characteristic parameter generated from the timbre template is a standard, and singing voice synthesizing is executed by change of the characteristic parameter of the articulation part so that it is agreed with the characteristic parameter of this timbre part.
In the singing voice synthesizing apparatus disclosed in Japanese Patent Application No.2001-67258, there were cases that the singing voice to be synthesized was unnatural. The causes for that are the followings:
a change in the characteristic parameter of the transition part is different from a change in that if original transition part because the change of the articulation template is changed; and
a phoneme before a long sound part is always same regardless of a kind of the phoneme because the characteristic parameter of the long sound part is also calculated from the addition of the characteristic parameter generated from the timbre template with the changing component of the stationary template.
That is, in the singing voice synthesizing apparatus disclosed in Japanese Patent Application No.2001-67258, there were cases that the synthesized singing voice was unnatural because the parameter of the long sound and the transition part has been added based on the characteristic parameter of the timbre template that is just a part of whole singing song.
For example, in the conventional singing voice synthesizing apparatus, in a case of making a singer sing “saita”, phonemes between phonemes do not transit naturally, and the singing voice to be synthesized has an unnatural audio sound. Also, there is a case that it cannot be judged what the synthesized singing voice is singing.
That is, in the singing voice, for example, in a case of singing “saita”, it is pronounced without partitions of each phoneme (“sa”, “i” and “ta”), and it is normally pronounced by inserting a long sound part and a transition part between each phoneme as “[#s] sa (a), [ai], i, (i), [it], ta, (a) (“#” represents a silence). In this case of the example of “saita”, [#s], [ai] and [it] are the transition parts, and (a), (i) and (a) are the long sounds. Therefore, in a case that a singing voice is synthesized from performance data such as MIDI information, it is significant how realistically the transition part and the long sound part are generated.