This invention relates to a digital speech interpolation system having a centralized echo canceler and an improved speech detector.
Digital speech interpolation (hereinafter referred to as DSI) is used in, for example, satellite communication systems as a means of compressing voice signals to conserve channel resources. The transmitting apparatus in a DSI system includes a speech detector that detects the presence or absence of speech in the transmit signal, thus enabling the transmit signal to be compressed by the removal of silent portions. Typical voice signals have an activity ratio of less than 50%; that is, they consist of less than 50% speech and more than 50% silence. DSI accordingly makes it possible for the transmitting apparatus to have, for example, twice as many input channels as output channels. The input channels are assigned to output channels only when speech is detected. Assignment information is added to the DSI output signal so that the distant receiving apparatus can decompress the received signal and assign it to the appropriate receiving channels.
To ensure that channel activity ratios remain low enough for efficient compression, a DSI system must cope with the problem of echo, which occurs due to diversion of the receive signal into the transmit signal at the hybrid interface to the telephone or other terminal served by a channel. If not removed, echo will be mistakenly detected as speech by the speech detector. A simple method of removing echo is to attenuate the transmit signal when the receive level exceeds the transmit level; a device for this purpose is termed an echo suppressor. Such echo suppressors are unsatisfactory, however, in that they cut off one party's speech when both parties talk at once. A superior method of removing echo is to predict, from the level of the receive signal, the echo that will be diverted into in the transmit signal, generate a simulated echo signal, and subtract the simulated echo signal from the transmit signal, thereby substantially canceling the echo; a device for this purpose is termed an echo canceler. The characteristics of an echo canceler can be improved if the echo canceler can also, when necessary, perform center clipping, a process that eliminates low-level signal components in order to remove the remaining slight, uncanceled echo. Since echo must be removed from each channel individually, prior-art DSI systems have provided a separate echo canceler (or suppressor) for each terminal.
To maintain good speech quality, a DSI system must also cope with the problem of speech dropouts caused by overcompression, especially with the loss of low-level segments occurring at the ends of many words, and in the interior of some words. One way to prevent dropout is to provide the speech detector with a hangover controller that prolongs the detected speech interval by a certain amount. A prior-art scheme of hangover control employs a long and a short hangover time which are selected according to the length of the preceding speech interval. This hangover control system has been described in the article Tekio-shikiichi-gata Onsei Kenshutsuki ("An Adaptive-Threshold Speech Detector") by Kato, Nishiya, and Shimoyama, published by the Communication Systems Study Group of the Institute of Electronics and Communication Engineers of Japan in Report CS84-187.
Prior-art DSI systems as described above suffer from several problems. One problem is that providing a separate echo canceler at each terminal makes the system large, expensive, and inconvenient to operate and maintain. A second problem is that the separation of the echo cancelers from the speech detector prevents these devices from acting cooperatively. Center clipping, for example, strongly affects the behavior of the speech detector: if the speech detector does not make a special allowance for center clipping, it may alter its speech detection threshold in an inappropriate manner, causing background noise to be mistakenly detected as speech. The speech detector is also liable to operate incorrectly if the transmit signal leaks into the returning echo, a phenomenon referred to herein as double-talk. A further problem is that the hangover time added by the hangover controller is not responsive to the background noise level, in consequence of which either the hangover time is inadequate at high background noise levels, allowing the persistence of speech dropouts that degrade communication quality, or the hangover time is unnecessarily long at low background noise levels, causing the apparent activity ratio to rise with a deleterious effect on speech compression efficiency.