1. Field of the Invention
This invention relates to a singing voice-synthesizing method and apparatus for synthesizing singing voices based on performance data being input in real time, and a storage medium storing a program for executing the method.
2. Prior Art
Conventionally, a singing voice-synthesizing method of the above-mentioned kind has been proposed which makes the rise time of a phoneme to be sounded first (first phoneme) in accordance with a note-on signal based on performance data shorter than the rise time of the same phoneme when it is sounded in succession to another phoneme during the note-on period (see e.g. Japanese Laid-Open Patent Publication (Kokai) No. 10-49169).
FIG. 40A shows consonant singing-starting timing and vowel singing-starting timing of human singing, and this example shows a case in which words of a song, “sa”-“i”-“ta”, are sung at the respective pitches of “C3(do)”, “D3(re)”, and “E3(mi)”. In FIG. 40A, phonetic units each formed by a combination of a consonant and a vowel, such as “sa” and “ta”, are produced such that the consonant starts to be sounded earlier than the vowel.
On the other hand, FIG. 40B shows singing-starting timing of singing voices synthesized by the above-described conventional singing voice-synthesizing method. In this example, the same words of the lyric as in FIG. 40A are sung. Actual singing-starting time points T1 to T3 indicate respective starting time points at which singing voices start to be generated in response to respective note-on signals. According to the conventional method, when the singing voice of “sa” is generated, the singing-starting time point of the consonant “s” is set equal to or coincident with the actual singing-starting time point T1, and the amplitude level of the consonant “s” is rapidly increased from the time point T1 so as to avoid giving an impression of the singing voice being delayed compared with instrument sound (accompaniment sound).
The conventional singing voice-synthesizing method suffers from the following problems:
(1) The vowel singing-starting time points of the human singing shown in FIG. 40A approximately corresponds to the actual singing-starting time points (note-on time points) in the singing voice synthesis shown in FIG. 40B. However, in the case of FIG. 40B, the consonant singing-starting time points are set equal to the respective note-on time points, and at the same time the rise time of each consonant (first phoneme) is shortened, so that compared with the FIG. 40A case, the singing-starting timing and singing duration time become unnatural.
(2) Information of a phonetic unit is transmitted immediately before a note-on time point of the phonetic unit, and the singing voice corresponding to the information of the phonetic unit starts to be generated at the note-on time point. Therefore, it is impossible to start generation of the singing voice earlier than the note-on time point.
(3) The singing voice is not controlled in respect of state transitions, such as an attack (rise) portion, and a release (fall) portion. This makes it impossible to synthesize more natural singing voices.
(4) The singing voice is not controlled in respect effects, such as vibrato. This makes it impossible to synthesize more natural singing voices.