1. Field of the Invention
The present invention is directed to an echo suppressor for a human/machine dialogue system, the system being of a type wherein a machine produced input request produces any echo signal which corrupts a subsequently-received input speech signal.
2. Description of the Prior Art
By contrast to echo suppression methods that are used in person-to-person transmission systems, wherein echos that occur on the transmission link in telephone calls are suppressed in the central exchanges and wherein acoustic echos that occur in hands-free talking are suppressed, the purpose of an echo suppressor that is utilized in a human/machine dialogue system is to allow the speaker to have a more user-friendly intercourse with the machine. For example, the speaker expresses himself or herself in an input without having to wait for the machine to initiate a system input request (prompt).
Such echo suppressors are known, for example, from the Proceedings of Acoustical Society of Japan Spring Meeting, March 1992, pages 1-21; Y. Nagata et al., "Cancelling of synthetic speech for a real-time speech dialogue system" and from the Proceedings of the European Conference on Speech Communications and Technology, Madrid, September 1995, pages 149 through 152; R. Pacifici et al., "Echo cancelling in Speech Recognition Systems". Adaptive filters are utilized for the suppression of the echo--which would be disturbing in the speech recognition--produced by the input request of the speech input system in order to generate a reproduction or replication of the echo signal and to subtract this from the incoming speech signal. The residual error of the signal is then employed in order to determine the coefficient of the echo suppression vector, by minimizing the quadratic average of the output error. This occurs in time segments in which it is known that no voice signal of the speaker is present.
Such echo suppressors are effective for the purpose of suppressing local-area echos that are generated by the hybrid circuit of the telephone. These known echo suppressors have disadvantages, however, generating replications of echos that arise on long-distance connections in human/machine dialogue systems. In particular, difficulties can arise in making the high calculating capacity, that is needed in order to suppress echos that occur with several 10 ms delay, available fast enough. When, for example, the echo path amounts to 50 ms and the system input request and the received speech signal are sampled with a sampling rate of 8 kHz, then the echo suppressor must implement a filtering with 400 tap locations. Likewise, the adaptation algorithm must implement the updating of the coefficients of the echo suppression vector, with a length of 400 entries, at the sampling rate. In this example, this must occur at 8000 updates per second. The high mathematical complexity of the method and the slow convergence have a disadvantageous effect with respect to the required real-time conditions.
W. Kellermann, "Kompensation akustischer Echos in Frequenzteilbandern", Frequenz, Vol. 39, 1985, Nos. 7-8, pages 209-215, discloses an adaptive sub-band filtering wherein a speech signal is subdivided into a number of under-sampled frequency bands and is subsequently supplied to an NLMS algorithm-based adaptive filter. This sub-band dataset, however, is employed only in order to suppress echos that are produced in a room by a loudspeaker/microphone system to enable better hands-free talking and teleconferencing.
An echo suppressor for human/machine dialogue systems must satisfy certain demands. Its processing of the speech signal should not produce any significant additional calculating outlay for speech recognition. Likewise, a degradation of the recognition performance of the speech recognition system should be avoided.