The present invention relates generally to speech analysis and synthesis systems, and, more particularly, to techniques for estimating system noise in a speech analysis system. As will become apparent, one technique for determining whether a speech sound is "voiced" or "unvoiced" is, as described and claimed in the aforementioned parent application, to compare the energy level of the speech sound with a background noise level. Unvoiced sounds involve no vibration of the vocal cords, and are much lower in energy than voiced sounds.
An object of the present invention is to provide a noise estimation system for use in conjunction with the speech analysis system described and claimed in the aforementioned parent application, U.S. Pat. No. 4,058,676. So that the present invention may be clearly understood, the entire speech analysis system will be described by way of background. The analysis system in which the invention operates is of a type designed to function in real time, to combine a plurality of telephone channels into a single telephone channel for transmission, subsequent separation of the channels, and synthesis of the speech from the transmitted data.
In the past, numerous techniques have been devised and proposed for analyzing the human voice and deriving abbreviated speech data for use later with a synthesizing device for generating human voice sounds from the speech data. A description of such techniques both historical and modern, may be found in Flanagan, "SPEECH ANALYSIS, SYNTHESIS AND PERCEPTION" (2nd. edition, New York, 1972, Springer-Verlag).
Typically, modern analysis techniques divide digital speech signals into time segments called "frames" of speech data and the speech is analyzed frame by frame. The time segments are too short to be perceptible to the human auditory system and are analyzed to produce a pitch period parameter representative of the vibratory rate of the vocal cords, or, a parameter which indicates no vibration (voiced/unvoiced decision parameter). A power parameter is also generated indicating the relative intensity of the speech signal. Finally, a plurality of coefficient parameters are generated which are generally representative of the filter coefficients of an electrical analog of the human vocal tract.
These control parameters are used in a subsequent speech synthesizer which also is an electrical analog of the human vocal cords and tract which produced the original speech sounds. The electrical output of the synthesizer is applied to a suitable transducer to produce the audible speech sounds.
Generally, known analysis and synthesis techniques produce intelligible imitations of the human voice, but normally the artificiality is noticable. Thus, speech analysis and synthesis techniques have not been used in telephone systems, for example, where it is desired that the speakers not be aware of the analysis synthesis process taking place. Furthermore, the speech signals were normally produced by relatively hifidelity microphones and the like which permitted the speech analysis to take place on speech signals having the full range of appropriate audio frequencies. Speech signals derived from telephone channels with relatively narrow audio pass bands could not be successfully analyzed due to the lack of basic speech frequencies needed for successful analysis. In addition, the computational time required to analyze speech signals was such that it was difficult to perform the analysis process in "real time" for even a single voice channel. Thus, the analysis synthesis was practically useable only for special transmission mediums with relatively wide band widths. Utilization of the analysis synthesis technique for a single channel telephone line offered no particular advantages except for the fact that the transmitted speech data was difficult to decode without knowing the analysis and synthesis process itself.
Thus, prior to the invention claimed in the aforementioned parent application, there was a need for processing techniques that would permit practical utilization of the analysis process in telephone systems. The present invention contributes to the satisfaction of this need, by providing a noise estimation technique for use in analysis systems of the type described and claimed in the parent application.