This invention relates to digital speech synthesis circuits capable of being implemented in an integrated circuit device. More specifically, this invention relates to methods and apparatus for speech synthesis filter excitation.
Several techniques are known in the prior art for digitizing human speech. For example, pulse code modulation, differential pulse code modulation, adaptive predictive coding, delta modulation, channel vocoders, cepstrum vocoders, formant vocoders, voice excited vocoders, and linear predictive coding techniques of speed digitization are known. The techniques are briefly explained in "Voice Signals; Bit by Bit" on pages 28-34 of the October, 1973 issue of IEEE Spectrum.
In certain applications and particularly those in which digitized speech is to be stored in a memory, most researchers tend to use the linear predictive coding technique because it produces a very high quality speech using rather low data rates. An excellent example of the use of linear predictive coding systems, implementable in integrated circuit techniques may be seen in U.S. patent application Ser. No. 901,393, filed Apr. 28, 1978, now U.S. Pat. No. 4,209,836 issued June 24, 1980. The speech synthesis system described in the aforementioned U.S. Pat. No. 4,209,836 utilizes frames of data which are comprised of digital representations of pitch, energy and certain linear predictive coefficients for controlling a digital filter. The system described in the aforementioned U.S. Pat. No. 4,209,836 is capable of producing high quality synthetic human speed at a bit rate of as low as 1200 bits per second, utilizing a fixed rate of data frame entry. Linear predictive coding systems utilize a linear predictive filter which is excited by voiced and unvoiced excitation signals. Typically, a voiced excitation signal is generated by a periodic source, such as a chirp function. In other linear predictive coding synthesis systems, the unvoiced excitation is typically characterized as white noise, or pseudorandom digital signal. Such systems implemented in integrated circuitry, typically generate a fixed digital value as an unvoiced excitation signal, and simulate a white noise input by pseudorandom generation of a sign bit to be utilized with the fixed digital value. Such systems adequately generate a signal in digital circuitry which is equivalent to a white noise excitation; however, variances in amplitude between a chosen fixed digital value for unvoiced excitation and the varying periodic excitation utilized in voiced excitation result in unbalanced voiced/unvoiced excitation signals.
It is therefore one object of this invention to improve speech synthesis technology.
It is another object of this invention to provide a speech synthesis system within which the unvoiced excitation signal may be accurately scaled to balance with the voiced excitation signal.
It is still another object of this invention to provide a speech synthesis system within which the operator may choose an accurate level of unvoiced excitation signals.
The foregoing objects are achieved as is now described. A speech synthesis system utilizing a linear predictive filter with voiced and unvoiced excitation signals as inputs thereto produces digital signals representative of human speech. The voiced excitation signal is provided by a repeating chirp function stored in memory. The unvoiced excitation signal consists of two excitation signals of opposite sign, stored in programmable memory and randomly addressed. The programmable storage of unvoiced excitation signals allows gain scaling between voiced and unvoiced excitation to be easily accomplished.