1. Field of the Invention
One or more embodiments of the present invention relates to a technology for controlling, for example, a temporal fluctuation (hereinafter referred to as “pitch transition”) of a pitch of a voice to be synthesized.
2. Description of the Related Art
Hitherto, there has been proposed a voice synthesis technology for synthesizing a singing voice having an arbitrary pitch specified in time series by a user. For example, in Japanese Patent Application Laid-open No. 2014-098802, there is described a configuration for synthesizing a singing voice by setting a pitch transition (pitch curve) corresponding to a time series of a plurality of notes specified as a target to be synthesized, adjusting a pitch of a phonetic piece corresponding to a sound generation detail along the pitch transition, and then concatenating phonetic pieces with each other.
As a technology for generating a pitch transition, there also exist, for example, a configuration using a Fujisaki model, which is disclosed in Fujisaki, “Dynamic Characteristics of Voice Fundamental Frequency in Speech and Singing,” In: MacNeilage, P. F. (Ed.), The Production of Speech, Springer-Verlag, New York, USA. pp. 39-55, and a configuration using an HMM generated by machine learning to which a large number of voices are applied, which is disclosed in Keiichi Tokuda, “Basics of Voice Synthesis based on HMM”, The Institute of Electronics, Information and Communication Engineers, Technical Research Report, Vol. 100, No. 392, SP2000-74, pp. 43-50, (2000). Further, a configuration for executing machine learning of an HMM by decomposing a pitch transition into five tiers of a sentence, a phrase, a word, a mora, and a phoneme is disclosed in Suni, A. S., Aalto, D., Raitio, T., Alku, P., Vainio, M., et al., “Wavelets for Intonation Modeling in HMM Speech Synthesis,” In 8th ISCA Workshop on Speech Synthesis, Proceedings, Barcelona, Aug. 31-Sep. 2, 2013.