1. Field of the Invention
This invention relates to a musical sound synthesizer for synthesizing a musical sound having desired formants and a storage medium storing a program for synthesizing such a musical sound.
2. Prior Art
It is generally known that a sound generated by a natural musical instrument has formants peculiar to its own structure, such as the configuration of a sound-board in the case of a piano. A human voice also has peculiar formants determined by the shapes of related organs of the human body, such as the vocal cord, the vocal tract, and the oral cavity, and the formants characterize a timbre peculiar to the human voice.
To simulate the timbre of a natural musical instrument or a human vocal sound (singing sound) by an electronic musical instrument, a musical sound must be synthesized in accordance with formants peculiar to the timbre. An apparatus for synthesizing a sound having desired formants has been proposed e.g. by Japanese Laid-Open Patent Publication (Kokai) No. 3-200300 and Japanese Laid-Open Patent Publication (Kokai) No. 4-251297.
FIG. 1 shows an example of the arrangement of a musical sound synthesizer for synthesizing a vocal sound having such desired formants. In the synthesizer, performance information 1311 and lyrics information 1312 are input to a CPU 1301 e.g. as messages in MIDI (Musical Instrument Digital Interface) format. The performance information 1311 includes a note-on message and a note-off message each including pitch information. The lyrics information 1312 is a message designating an element of lyrics (phoneme data) of a song which is to be sounded according to a musical note designated by the performance information 1311. The lyrics information 1312 is provided as a system exclusive message in MIDI format. For instance, when elements of lyrics "" (Japanese word meaning "bloomed")which can be expressed by phonemes "saita" are synthesized at pitches of C3, E3, and G3, the performance information 1311 and the lyrics information 1312 are input to a CPU 1301 of the apparatus e.g. in the following sequence (1): EQU s&lt;20&gt;a&lt;0&gt; EQU note-on C3 EQU note-off C3 EQU i&lt;0&gt; EQU note-on E3 EQU note-off E3 EQU t&lt;02&gt;a&lt;00&gt; EQU note-on G3 EQU note-off G3 (1)
It should be noted that according to this method, data of an element of lyrics to be sounded is sent to the CPU 1301 prior to a note-on message according to which the element of lyrics is sounded. In the above sequence of messages, "s","a","i",and "t" represent phonemes, and the numerical value within &lt; &gt; following each of the phonemes represents the duration of the phoneme. &lt;0&gt;, however, designates that the sounding of the phoneme should be maintained until a note-on message for the following phoneme is received.
As the CPU 1301 receives the above sequence (1) of MIDI messages, it operates in the following manner: First, when data of an element of lyrics to be sounded "s&lt;20&gt;a&lt;0&gt;" is received, the data is stored in a lyrics information buffer 1305. Then, when a message "note-on C3" is received, the CPU 1301 obtains information of the lyrics element "s&lt;20&gt;a&lt;0&gt;" from the lyrics information buffer 1305, calculates formant parameters for generating a sound of the lyrics element at the designated pitch C3 and supplies the same to a (voiced sound/unvoiced sound) formant-synthesizing tone generator 1302. The CPU 1301 subsequently receives a message "note-off C3",but in the present case, "a&lt;0&gt;" has already been designated, and therefore, the CPU ignores the received message "note-on C3" to maintain the sounding of the phoneme "a" until the following note-on message is received. It should be noted, however, when the phonemes "sa" and the phoneme "i" are to be sounded separately, the CPU 1301 delivers data "note-off C3" to the formant-synthesizing tone generator 1302 to stop sounding of the phonemes "sa" at the pitch C3. Then, when data of an lyrics element "i&lt;0&gt;" to be sounded is received, the data (lyrics data) is stored in the lyrics information buffer 1305, and when a message "note-on E3" is received, the CPU 1301 obtains information of the lyrics element "i&lt;0&gt;" to be sounded from the lyrics information buffer 1305, and calculates formant parameters for generating a vocal sound of the lyrics element at the designated pitch "E3" to send the calculated formant parameters to the formant-synthesizing tone generator 1302. Thereafter, musical sounds of phonemes "ta" are generated in the same manner.
The formant parameters are time sequence data, and transferred from the CPU 1301 to the formant-synthesizing tone generator 1302 at predetermined time intervals. The predetermined time intervals are generally set to such a low rate of several milliseconds as to generate tones having features of a human voice. By successively changing the formants at the predetermined time intervals, musical sounds having features of a human vocal sound are generated. The formant parameters include a parameter for differentiation between a voiced sound and an unvoiced sound, a formant center frequency, a formant level, a formant bandwidth, etc. In FIG. 1, reference numeral 1303 designates a program memory storing control programs executed by the CPU 1301, and 1304 a working memory for temporarily storing various kinds of working data.
To generate performance data for a musical piece provided with lyrics to be played by the musical sound synthesizer constructed as above, it is required to set timing for starting each instrument sound or singing sound, duration of the same, etc. according to a musical note.
However, in general, a human vocal sound is slow to rise in its level compared with an instrument sound, and therefore, there is a discrepancy in timing between a start of generation of a human vocal sound designated by performance data and a start of generation of the same actually sensed by the hearing. For instance, even if an instrument sound and a singing sound are generated simultaneously in response to a note-on signal for the instrument sound, it is sensed by the hearing as if the singing sound started with a slight delay with respect to the instrument sound.
As a specific example, let it be assumed that based on data of a musical piece which is comprised of melody data having a timbre which rises relatively quickly, e.g. a timbre of piano, input by keyboard performance, i.e. playing the piano, and accompaniment part data prepared in a manner corresponding to the melody data, automatic performance is carried out with lyrics assigned to the melody data and a synthesized human voice as a singing part controlled to sound the melody instead of the piano, while sounding the accompaniment part data. Then, one will most probably feel that the singing part (human voice sound) which is slow in rise time and the accompaniment part are conspicuously out of time with each other.
This problem can be overcome by adjusting the timing of performance data of the entire musical piece or each performance part, which is, however, very troublesome.
Further, when the conventional musical sound synthesizer generates a human vocal sound or the like, there is a problem that consecutive phonemes are not sounded in a properly coarticulated manner (particularly in transition from a voiced sound to an unvoiced sound), which results in an unnatural sound.