This invention relates to apparatus for forming and synthesizing natural sounding speech.
The use of phase vocoder techniques in the fields of speech transmission and frequency bandwidth reduction has been disclosed in U.S. Pat. No. 3,360,610, issued to me on Dec. 26, 1967. Therein, a communication arrangement is described in which speech signals to be transmitted are encoded into a plurality of narrow band components which occupy a combined bandwidth narrower than that of the unencoded speech. Briefly summarized, phase vocoder encoding is performed by computing, at each of a set of predetermined frequencies, .omega..sub.i, which span the frequency range of an incoming speech signal, a pair of signals respectively representative of the real and the imaginary parts of the short-time Fourier transform of the original speech signal. From each pair of such signals there is developed a pair of narrow band signals; one signal .vertline.S.sub.i .vertline., representing the magnitude of the short-time Fourier transform, and the other signal, .phi..sub.i, representing the time derivative of the phase angle of the short-time Fourier transform. In accordance with the above communication arrangement, these narrow band signals are transmitted to a receiver wherein a replica of the original signal is reproduced by generating a plurality of cosine signals having the same predetermined frequencies at which the short-time Fourier transform was evaluated. Each cosine signal is then modulated in amplitude and phase angle by the pairs of narrow band signals, and the modulated signals are summed to produce the desired replica signal.
J. P. Carlson, in a paper entitled "Digitalized Phase Vocoder," published in the Proceedings of the 1967 Conference on Speech Communication and Processing, pages 292-296, describes the digitizing of the narrow band signals .vertline.S.sub.i .vertline. and .phi..sub.i before transmission, and indicates that at a 9600 bit/second transmission rate, for example, the degradation due to digitization of the parameters is unnoticeable in the reconstructed speech signal.
In a separate field of art, many attempts have been made to synthesize natural sounding speech from stored speech signals by the use of formant coding of phonemes (or words) into stored signals. One such apparatus is disclosed in my U.S. Pat. No. 3,828,132 issued Aug. 6, 1974. These systems are generally satisfactory, but when pitch and duration control capability is required, as it is when contextual constraints of the synthesized speech are strong, these systems become complex and require lengthy computations.
Accordingly, it is an object of this invention to provide a system for synthesizing natural sounding speech.
It is a further object of this invention to provide means for synthesizing speech wherein speech pitch and duration are effectively controlled.
It is a still further object of this invention to synthesize speech from stored signals of vocabulary words encoded in accordance with phase vocoder techniques.