The present invention relates generally to speech analysis and synthesis systems and, more particularly, to systems designed to operate in real time for combining a plurality of telephone channels into a single telephone channel for further transmission and subsequent separation of the channels and synthesizing the speech waves from abbreviated speech data.
In the past, numerous techniques have been devised and proposed for analyzing the human voice and deriving abbreviated speech data for use later with a synthesizing device for generating human voice sounds from the speech data. A description of such techniques both historical and modern, may be found in Flanagan, "SPEECH ANALYSIS, SYNTHESIS AND PERCEPTION" (2nd. edition, New York, 1972, Springer-Verlag).
Typically, modern analysis techniques divide digital speech signals into time segments called "frames" of speech data and the speech is analyzed frame by frame. The time segments are too short to be perceptible to the human auditory system and are analyzed to produce a pitch period parameter representative of the vibratory rate of the vocal cords, or, a parameter which indicates no vibration (voiced/unvoiced decision parameter). A power parameter is also generated indicating the relative intensity of the speech signal. Finally, a plurality of coefficient parameters are generated which are generally representative of the filter coefficients of an electrical analog of the human vocal tract.
These control parameters are used in a subsequent speech synthesizer which also is an electrical analog of the human vocal cords and tract which produced the original speech sounds. The electrical output of the synthesizer is applied to a suitable transducer to produce the audible speech sounds.
Generally, known analysis and synthesis techniques produce intelligible imitations of the human voice, but normally the artificiality is noticable. Thus, speech analysis and synthesis techniques have not been used in telephone systems, for example, where it is desired that the speakers not be aware of the analysis synthesis process taking place. Furthermore, the speech signals were normally produced by relatively hifidelity microphones and the like which permitted the speech analysis to take place on speech signals having the full range of appropriate audio frequencies. Speech signals derived from telephone channels with relatively narrow audio pass bands could not be successfully analyzed due to the lack of basic speech frequencies needed for successful analysis. In addition, the computational time required to analyze speech signals was such that it was difficult to perform the analysis process in "real time" for even a single voice channel. Thus, the analysis synthesis was practically useable only for special transmission mediums with relatively wide band widths. Utilization of the analysis synthesis technique for a single channel telephone line offered no particular advantages except for the fact that the transmitted speech data was difficult to decode without knowing the analysis and synthesis process itself.
Thus, in the field of speech analysis and synthesis there has long been a need for processing techniques which would permit practical utilization of the analysis process in systems such as telephone circuits to take advantage of the abbreviated speech data to combine a number of telephone voice channels into a single comparable voice channel. The present invention satisfies that need.