1. Field of the Invention
This invention relates to a singing sound-synthesizing apparatus and method for synthesizing human vocal sounds by sounding phonemes of a lyric of a song based on lyric data to thereby generate singing sounds of the song.
2. Prior Art
Conventionally, there have been proposed various techniques of synthesizing vocal sounds, including a vocal sound synthesizer based on a formant synthesization method proposed e.g. by Japanese Laid-Open Patent Publication (Kokai) No. 3-200300 and Japanese Laid-Open Patent Publication (Kokai) No. 4-251297.
A vocal sound synthesizer based on the formant synthesization method disclosed by Japanese Laid-Open Patent Publication (Kokai) No. 4-251297 comprises memory means storing in a plurality of steps, data of parameters related to formants which change in time sequence, reading means for reading the parameter data from the memory means by the plurality of steps in time sequence to generate a vocal sound, and formant-synthesizing means which is supplied with the read parameter data, for synthesizing a musical sound having formant characteristics determined by the parameter data. This synthesizer changes formants of a vocal sound signal in time sequence.
When a singing sound is synthesized by the prior art technique based on the formant synthesization method, if an English lyric "hit" is sounded in a manner corresponding to one quarter note, sounding time periods T(h), T(i) and T(t) in terms of absolute time periods are assigned to respective phonemes "h", "i", and "t" of the lyric, and parameters are set such that the sum of the sounding time periods T(h)+T(i)+T(t) becomes equal to a sounding time period over which the quarter note is sounded stored in the memory means (referred to hereinafter as "the first conventional method"). Alternatively, the sum of the sounding time periods T(h)+T(i)+T(t) is set to a shorter time period than the sounding time period over which the quarter note is sounded, and the sounding of the lyric is stopped when the sounding time period assigned to the last phoneme "t" has elapsed, or the sounding of the last phoneme "t" is continued until the sounding time period over which the quarter note is sounded elapses (referred to hereinafter as "the second conventional method").
According to the first conventional method, however, the singing sound can be generated only at a predetermined tempo. One way to overcome this inconvenience may be a method of determining the sounding time periods of the phonemes in terms of relative time periods. This method, however, has the disadvantage that if the sounding time periods, particularly, of unvoiced sounds (consonants), such as phonemes "h" and "t" are changed according to the tempo, the resulting singing sound is unnatural.
On the other hand, according to the second conventional method, both of the stoppage of the sounding of the lyric upon the lapse of the sounding time period assigned to the phoneme "t" and the continuation of the sounding of the phoneme "t" until the sounding time period over which the quarter note is sounded lapses result in an unnatural and odd sound.
A so-called "Synthesis-by-rule" method is another method of synthesizing vocal sounds of desired words. According to this method, vocal sound waves are analyzed in units of vocal sounds having short lengths, such as phonemes, and the resulting parameters are stored as vocal sound data, and control signals required for driving a vocal sound synthesizer are formed according to a predetermined rule based on the stored vocal sound data.
The "Synthesis-by-rule" method is often applied to synthetization of vocal sounds using PCM waveforms. In general, the synthesization of vocal sounds has a large problem to be solved, i.e. coarticulation between phonemes for synthesizing natural vocal sounds. To realize proper coarticulation, the method applied to the vocal sound synthesizer using PCM waveforms can successfully achieve proper coarticulation by using phoneme fractions edited by a waveform-superposing method or the like, and preparing a lots of waveforms in advance.
On the other hand, a singing sound synthesizer has been proposed by the present assignee in Japanese Patent Application No. 7-218241 which applies the "Synthesis-by-rule" method to synthesization of music sounds, to synthesize a natural singing sound based on lyric data.
When a singing sound synthesizer employs the "Synthesis-by-rule" method applied to synthesization of singing sounds using PCM waveforms, there arise inconveniences that a large volume of data are required, and it is difficult to convert voice characteristics to other ones as well as to follow up a large change in pitch.
When a singing sound synthesizer employs the formant synthesization method, this synthesizer is advantageous over the synthesizer based on the "Synthesis-by-rule" method applied to the PCM waveform synthesization in that smooth coarticulation can be effected, only a small amount of data is required, it is possible to change the pitch over a wide range, etc. However, so far as the level of recognition of a sound, i.e. naturalness of a synthesized sound is concerned, the former is inferior to the latter. Particularly, it is difficult for the formant synthesization method to generate sounds of consonants which are natural.