1. Field of the Invention
The present invention relates to data synthesis apparatus and programs, and more particularly to such apparatus and programs that synthesize voice and musical sound data.
2. Description of the Related Art
In the past, vocoders are known which convert the pitch of a human being's voice to that of a sound that will be produced from a keyboard instrument. The vocoder divides voice waveform data of the human being's voice inputted thereto into a plurality of frequency components, analyses musical sound waveform data outputted from the keyboard instrument, and then synthesizes the voice and musical-sound waveform data. As a result, a tone of the human being's voice can be produced with a corresponding pitch of a musical sound to be produced by the instrument.
Japanese Patent No. 2800465 discloses an electronic musical instrument that performs as a musical sound a song to be sung by a human being, using such data synthesis. The electronic instrument of this patent comprises a keyboard that generates pitch specifying information, a ROM that has stored a plurality of items of time-series formant information characterizing the voices uttered by a like number of human beings, and a formant forming sound source, responsive to generation of pitch specifying information by the keyboard, for reading out the plurality of items of time-series formant information sequentially from the ROM and for forming a voice from the pitch specifying information and the sequentially read plurality of items of formant information.
The formant represents a spectrum distribution of human being's voice, characterizing the same. Analysis of the frequencies of the human being's voice clarifies that a different pronunciation has a different spectrum. On the other hand, when different persons utter the same sound, their spectra are the same. For example, when several persons utter “” (phonetic sign) individually, we can hear the same sound “” irrespective of the natures of their voices because the spectra of “” have the same spectrum distribution.
The formant information storage means composed of ROM 15 of FIG. 1 of the patent comprises a syllable data sequence table, which comprises a frequency sequencer and a level sequencer and which has stored main four time-series formant frequencies F1-F4 and levels (or amplitudes) L1-L4 that characterize the respective syllables (including the Japanese syllabary, respective voiced consonants, and p-sounds in the kana syllabary) of human being's voice. Thus, a human being's voice having a pitch specified by the keyboard is synthesizable. Simultaneous utterance of the same voices with different pitches, or chorus, is possible.
In this case, a formant synthesis apparatus disclosed in another patent publication (identified by TOKKAIHEI No. 2-262698) is used as a formant forming sound source. The formant synthesis apparatus is disclosed in FIG. 1 of this publication comprises a pulse generator 1, a carrier generator 2, a modulated waveform generator 3, adders 4 and 5, a logarithm/antilog conversion table 6, and a D/A converter 7. A formant sound is synthesized based on a formant central frequency information value Ff, a formant basic frequency information value Fo, formant form parameters (including band width values ka and kb, and shift values na and nb) and envelope waveform data indicative of the formant sound that are received externally. A phase accumulator 11 of the pulse generator 1 accumulates formant basic frequency information values Fo in synchronization with clock pulses φ having a predetermined period. In carrier generator 2, a phase accumulator 21 accumulates formant central frequency information values Ff sequentially in synchronization with clock pulses φ and outputs resulting values sequentially as read address signals for a sinusoidal memory 22.
Thus, it is easy to synthesize the voice waveform data read from the ROM and the musical-sound waveform data obtained from the keyboard. However, for example, when man's voice data from a microphone is received or voice data is read from a memory that has stored the man's voice data received from the microphone, the periods of their voice waveform data are not clear. Thus, phase discrepancy would occur and normal data synthesis cannot be achieved. In addition, there is a possibility that overtone data contained in the voice data will be detected erroneously as representing a keynote and subjected to data synthesis. Thus, a voice to be outputted would be distorted.