This invention relates to apparatus for forming and synthesizing natural sounding speech.
To synthesize speech signals from stored information it is, generally, advantageous to code the stored speech elements into a convenient and an efficient code. Most speech synthesis apparatus use coded speech signals based on the formant information contained in the phonemes of the speech signals. In one sense, this is the natural approach to speech coding because it reflects the process by which speech is vocally generated in the human throat. One such speech synthesis system is disclosed by me in U.S. Pat. No. 3,838,132 issued Aug. 6, 1974.
However, other schemes for coding analog signals exist. One such scheme, for example, involves the use of vocoder techniques to encode analog signals, and particularly speech signals. This has been disclosed in U.S. Pat. No. 3,360,610 issued to me on Dec. 26, 1967. Therein a communication arrangement is described in which speech signals to be transmitted are encoded into a plurality of narrow band components which occupy a combined bandwidth narrower than that of the unencoded speech. Briefly summarized, phase vocoder encoding is performed by computing, at each of a set of predetermined frequencies, .omega..sub.i, which span the frequency range of an incoming speech signal, a pair of signals respectively representative of the real and the imaginary parts of the short-time Fourier transform of the original speech signal. From each pair of such signals there is developed a pair of narrow band signals; one signal, .vertline.S.sub.i .vertline., representing the magnitude of the short-time Fourier transform, and the other signal, .phi..sub.i, representing the time derivative of the phase angle of the short-time Fourier transform. In accordance with the above communication arrangement, these narrow band signals are transmitted to a receiver wherein a replica of the original signal is reproduced by generating a plurality of cosine signals having the same predetermined frequencies at which the short-time Fourier transforms were evaluated. Each cosine signal is then modulated in amplitude and phase angle by the pairs of narrow band signals, and the modulated signals are summed to produce the desired replica signal.
The phase vocoder art has been extended by J. P. Carlson, in a paper entitled "Digitalized Phase Vocoder," published in the Proceedings of the 1967 Conference on Speech Communication and Processing, pages 292-296, wherein Carlson describes the digitizing of the narrow band signals .vertline.S.sub.i .vertline. and .phi..sub.i before transmission, and indicates that at a 9600 bit/second transmission rate, for example, the degradation due to digitization of the parameters is unnoticeable in the reconstructed speech signal.
In another article, entitled "Phase Vocoder," by J. L. Flanagan et al, Bell System Technical Journal, Volume 45, No. 9, November 1966, page 1493, it is shown that if the analyzing bandwidth of the phase vocoder is narrow compared to the total speech bandwidth, then the phase derivative signal is representative of the pitch of the speech signal, and the magnitude of the short-time spectrum signal is representative of the strength of the speech signal at particular frequency bands. Utilizing this characteristic, in a copending application, Ser. No. 476,577, filed June 5, 1974 (Case 31) a system is disclosed which synthesizes speech from stored signals of vocabulary words encoded by a phase vocoder having narrow analyzing bands as compared to the bandwidth of the encoded signal. In accordance with the invention disclosed in this copending application, natural sounding speech is formed and synthesized by withdrawing from memory stored signals corresponding to the desired words, by concatenating the withdrawn signals, and by independently modifying the duration and pitch of the concatenated signals. Duration control is achieved by inserting between successively withdrawn different signals a predetermined number of interpolated signals. This causes an effective slowdown of the speech with no frequency distortion. Control of pitch is achieved by multiplying the phase derivative signals by a chosen factor. Speech synthesis is completed by converting the modified signals from digital to analog format and by decoding the signals in accordance with known phase vocoder techniques.
To the best of applicant's knowledge, the prior art does not disclose a system which directly controls the emphasis characteristic of synthesized speech. Accordingly, one objective of this invention is to provide a system for synthesizing natural sounding speech wherein the emphasis characteristic of speech is effectively controlled.
Another objective of this invention is to synthesize speech from stored signals of vocabulary words encoded in accordance with phase vocoder techniques.