At present, in order to generate singing voice, it is first of all necessary that “a human sings” or that “a singing synthesis technique is used to artificially generate singing voice (by adjustment of singing synthesis parameters)” as described in Non-Patent Document 1. Further, it may sometime be necessary to cut and paste temporal signals of singing voice which is a basis for singing generation or to use some signal processing technique for time stretching and conversion. Final singing or vocal is thus obtained by “editing”. In this sense, those who have good singing skills, are good at adjustment of singing synthesis parameters, or are skilled in editing singing or vocal can be considered as “experts at singing generation”. As described above, singing generation requires high singing skills, advanced expertise in the art, and time-consuming effort. For those who do not have skills as described above, it has been impossible so far to freely generate high-quality singing or vocal.
In recent years, commercially available software for singing synthesis has been increasingly attracting the public attention in the art of singing voice generation which conventionally uses human singing voice. Accordingly, an increasing number of listeners enjoy such singing synthesis (refer to Non-Patent Document 2). Text-to-singing (lyrics-to-singing) techniques are dominant in singing synthesis. In these techniques, “lyrics” and “musical notes (a sequence of notes)” are used as inputs to synthesize singing voice. Commercially available software for singing synthesis employs concatenative synthesis techniques because of their high quality (refer to Nan-Patent Documents 3 and 4). HMM (Hidden Markov Model) synthesis techniques have recently come into use (refer to Non-Patent Documents 5 and 6). Further, another study has proposed a system capable of simultaneously composing music automatically and synthesizing singing voice using “lyrics” as a sole input (refer to Non-document 7). A further study has proposed a technique to expand singing synthesis by voice quality conversion (refer to Non-Patent Document 8). Some studies have proposed speech-to-singing techniques to convert speaking voice which reads lyrics of a target song to be synthesized into singing voice with the voice quality being maintained (refer to Non-Patent documents 9 and 10), and a further study has proposed a singing-to-singing technique to synthesize singing voice by using a guide vocal as an input and mimicking vocal expressions such as the pitch and power of the guide vocal (refer to Non-Patent Document 11).
Time stretching and pitch correction accompanied by cut-and-paste and signal processing can be performed on the singing voices obtained as described above, using DAW (Digital Audio Workstation) or the like. In addition, voice quality conversion (refer to Non-Patent Documents 12 and 13), pitch and voice quality morphing (refer to non-Patent Documents 14 and 15), and high-quality real-time pitch correction (refer to Non-patent Document 16) have been studied. Further, a study has proposed to separately input pitch information and performance information and then to integrate both information for a user who has difficulties in inputting musical performance on a real-time basis when generating MIDI sequence data of instruments. This study has demonstrated effectiveness.