As is well known in the art in speech synthesis, the only and most significant requirement of an intelligible speech synthesizer is the generation of appropriate formant frequencies or phonemes to be reproduced.
Current and recent synthesizers operate by generating the format frequencies in the following way. Depending on the phoneme of interest, either voiced or unvoiced excitation is produced by electronic means. The voiced excitation is characterized by a power spectrum having a low frequency cutoff at the pitch frequency and a power that decreases with increasing frequency above the pitch frequency. Unvoiced excitation is characterized by a broad band white noise spectrum. One or the other of these waveforms is then passed through a series of filters or other electronic circuitry that causes certain selected frequencies (the formant frequencies of interest) to be amplified. The resulting power spectrum of voiced phonemes, when played into a speaker, produces the audible representation of the phoneme of interest. Such devices are generally called vocoders, and LPC (linear prediction coding) and PARCOR (partial auto correlation) are typical techniques for those vocoders.
In such devices the formant frequency information required to generate a string of phonemes in order to produce connected speech is generally stored in a full-sized computer that also controls the volume, the duration, voiced and unvoiced distinctions, etc. Thus, while existing vocoders are able to generate very large vocabularies, they require a full sized computer and are not capable of being miniaturized.
A recent speech synthesizer relying upon a new concept has been developed without using the vocoder techniques in order to avoid the prior art problems. That is, applicants' newly developed compression technique and a well known compression technique are combined to compress information to a tangible extent with minimum degradation of the speech intelligibility.
Such well known technology development is described in Japanese Patent Pre-publications Nos. 59207/1976 and 122004 (1977) by F. S. Mozer, whereby both quantized signals and compression instruction signals are stored in a memory of a solid state speech synthesizer and selected portions of complex sound waveforms are also stored within the synthesizer. The quantized signals, selected portions and the compression instruction signals are combined for re-synthesis purposes.