1. Field of the Invention
The present invention relates to a speech canceler used for speech recognition on a telephone line, and, more particularly, to a technique for canceling both ambient noise introduced with a caller's voice and automated voice generated by the unit crossing over to the receiving side.
2. Description of the Related Art
When speech recognition is performed over a telephone line, it is often in the context of an interactive system, usually providing guidance consisting of an automated voice, which may be a computer-generated voice, a tape recording, or similar means. An example of such interaction is as follows:
&lt;Guidance&gt; This is XX Trading. Please state your inquiry?
&lt;Caller&gt; Inquiry on inventory.
&lt;Guidance&gt;Is it inquiry on inventory? Please answer with yes or no.
&lt;Caller&gt; Yes.
In this example of interaction, when the caller becomes familiar with the system to some extent, he or she may speak before completion of the guidance. Possibilities include a case in which the caller says "Inquiry on inventory" immediately after the first guidance of "This is XX Trading," or a case in which the caller says "Yes" immediately after the second guidance of "Is it inquiry on inventory?" Conventionally, in a speech recognition system, when the automated voice output by the system and the caller's voice overlap, input into the recognition system is both voices mixed by a two-four wire converter (hybrid). Since correct recognition cannot be attained when the two voices are overlapped, such conditions are unacceptable. Accordingly, even a user who is familiar to some extent should always wait until the automated voice completes an instruction, making the system very redundant and difficult-to-use, to the displeasure of users.
These speech recognition devices are designed to overcome such problems, and to enable them to recognize the user's voice uttered without awaiting completion of an automated voice prompt (so-called "barge-in") at high accuracy by extracting only the caller's voice when the automated voice and the caller's voice are overlapped. As an example, a speech canceler shown in "Echo Canceler Technology," edited by Tsujii, Nihon Kogyo Gijutsu Center, p. 4, December 1986 (hereinafter called "Reference 1") is described with reference to FIG. 13. The speech canceler shown in FIG. 13 is an example in which the technology of an echo canceler is applied to cancellation of a crossed-over signal in the hybrid within a repeating switch for a long distance telephone line, wherein speech at the transmission side is equalized with that at the reception side by using an adaptive filter on a time waveform, and subtraction is performed on the time waveform. Here, a description is given of an application of a speech recognition device in which the speech canceler described in Reference 1 is used for a telephone set at a subscriber. That is, when an automated voice is output, it crosses over the hybrid 131 to the receiver, which is the four-wire section. A pseudo echo generator section 133 corrects transmission characteristics of crossed-over speech so that residual echo is minimized, and generates a pseudo echo. A subtracter 132 subtracts the pseudo echo output from the pseudo echo generator section 133 from the crossed-over automated voice to cancel only the automated voice. In a state in which the caller's voice transmitted over the telephone line overlaps the automated voice, the pseudo echo generator section 133 generates the pseudo echo by utilizing the transmission characteristics most recently estimated, and the subtracter 132 cancels the echo relating only to the automated voice. The speech recognition section 134 performs speech recognition by using speech after the automated voice is removed.
Although the example shown in FIG. 13 indicates use of the adaptive filter on the time waveform, it may be implemented by subtraction of the power spectrum (so-called spectral subtraction of dual inputs). For example, when dual-input spectral subtraction shown in "Word Speech Recognition System with Dual inputs under Noise," Ariyoshi, Matsushita, and Fujimoto, Acoustic Society of Japan, Proceedings, Fall, 1-8-5. pp. 9-10, September 1990 (hereinafter called Reference 2) is used, a configuration similar to FIG. 13 may be implemented by an adaptive filter on a power spectrum and subtraction. Reference 2 shows a case where it is applied to perform speech recognition in a car. In this case, although ambient noise in the car is the subject of cancellation, in principle, it is same as speech cancellation technique of canceling the automated voice on the telephone line. This system can significantly reduce the amount of processing compared to the speech canceler described in Reference 1 because it performs the noise cancellation operation after removing phases on the power spectrum, so that a less expensive system can be configured.
However, the conventional speech canceler shown in Reference 1 only cancels the automated voice generated by the unit, and does not provide any measures against noise around the caller even when such noise has a high level, so that there arises a problem that the speech recognition performance is deteriorated.
Also, even if the speech canceler shown in Reference 2 is applied to the speech canceler on the telephone line, it only cancels the automated voice, so that there still remains the problem mentioned above.
When the noise level around the caller is high, therefore, it is impossible to correctly estimate the transmission characteristics on crossing-over to the receiver of the transmitted signal of a telephone set because of its effect, leading to deterioration of performance for canceling the automated voice.
The present invention is designed to solve the above problems, and is intended to provide a high performance speech and noise canceler canceling not only the automated voice generated by the unit but also noise around the caller, as well as a speech recognition device.
Another object is to cancel automated voice generated by the unit at high accuracy by canceling the influence of noise around the caller.