This invention relates, in general, to speech synthesis systems, and more particularly, to a speech synthesis system which can be easily controlled by a microprocessor.
Voiced speech is physically generated by creating, with the vocal cords, an impulse repeated at the pitch frequency, and filtering the signal in the mouth and nose cavity. In a frequency domain, the vocal chords generate bursts of energy at harmonics of the pitch frequency. The filtering of the mouth and nose cavity attenuates various harmonics to result in certain vocal sounds as the accentuated harmonic frequencies. In the past, three main techniques have been used to synthesize human speech. They are formant synthesis, linear predictive coding (LPC), and wave form digitization with compression. With these techniques, vocal utterances or phonemes have been linked by linguistic rules to generate words. Formant synthesis is a technique for modeling the natural resonances of the vocal tract. With this technique voiced sounds are generated from an impulse source that is modulated in amplitude to control intensity. The resulting signal is passed through two levels of filtering wherein the first is a time variant filter to provide the source spectrum and mouth radiation characteristics of the speech waveform. Unvoiced sounds generated as white noise are passed through a variable pole-zero filter.
Linear predictive coding is somewhat similar to formant synthesis since both are based in the frequency domain and use similar hardware. The basic difference is that LPC uses previous conditions to determine present filter coefficients and the quality of the synthesis improves as the number of coefficients is increased. Waveform digitization is the oldest approach taken for speech synthesis and relies on nothing more than sampling of the waveform in the time domain at twice the highest frequency of interest. Normally, data compression is used for this technique to avoid prohibitive memory requirements. These prior art systems tended to use large amounts of hardware and required considerable software.
Accordingly, it is an object of the present invention to provide an improved speech synthesis system and method of generating intelligible humanistic speech by generating voice sounds of speech by utilizing a waveform which alternates between primary formants of the sound at the pitch frequency.
Another object of the present invention is to generate actual formant frequencies by a microprocessor (MPU) based system in a more efficient manner.