1. Field of the Invention
The present invention relates to the echo suppressing, more particularly, to a echo suppressor, a method, and a computer readable storage medium for suppressing the echo caused by a sound which is generated by a sound output device.
2. Description of the Related Art
The speech recognition technique has been developed, while not reached to a perfect degree. For example, pressing a push-to-talk switch to mute a car navigation system can allow the system to accurately recognize the user's auditory instruction. But no redundant operation, such as the operation of pressing the push-to-talk switch, prior to speak to the system is demanded. To improve saving the operation needs an echo cancellation to suppress an echo caused by sounds emitted from a loudspeaker of the system to a microphone. More particularly, a sound form one of a plurality of loudspeakers in a multichannel car audio system will result in a noise affecting the voice of the user, as the sound will be received with a microphone designated for receiving user's auditory instruction. Therefore, a method of an improved echo cancellation is demanded for cancelling a sound emitted from a car audio system and received by a microphone provided for a speech recognition system.
FIG. 12 shows one of proposed echo cancelling systems, in which a conventional echo suppressing method (echo cancelling method) is applied to a multi-channel audio system. The method of echo suppressing in the system depends on that for a monophonic-channel audio system method by which echo-cancelling for one channel as shown in FIG. 12. Plural sound signals sent from an multichannel audio system 2000 are fed to corresponding loudspeakers 2001-1, - - - , and 2001-n each of which emits sound according to each signal. The echo suppressor 1000 operates so as to eliminate an echo signal from affected sound signal activated by the sound received by the microphone 2002, where the echo signal is a summation of sound signals derived by a plurality of channels.
The echo removal is carried out by suppressing an echo of an observation sound signal y(t) generated based on a received sound using reference sound signals x1(t), . . . , and xn(t) generated based on output sounds of plural channels(n) with plural suppressing mechanisms (echo cancellers) 1001-1, . . . , and 1001-n corresponding to the reference signals x1(t), . . . , and xn(t).
Besides the structure of FIG. 12, another echo suppressor realized by applying an echo suppressing method adaptable to a monophonic-channel audio system to a multi-channel audio system is proposed. FIG. 13 is a schematic diagram showing a conventional second echo suppressor. The second echo suppressor 1000 adds reference sound signals x1(t), . . . , and xn(t) generated based on sounds of plural channels with an adding mechanism 1002 to generate an added reference signal sound x(t) and suppresses an echo of an observation sound signal y(t) based on the added reference sound signal x(t).
FIG. 14 is a functional block diagram showing the functional configuration of the suppressing mechanisms 1001 of the conventional echo suppressor. Each suppressing mechanism 1001 includes a detecting unit 10010 for detecting a double-talk state in which a speaker is speaking and a single-talk state in which a speaker is not speaking (during the utterance of the car audio system), a filter factor updating unit 10011 for updating a filter coefficients necessary for estimating an echo level through processing based on the adaptive normalized least mean square (NLMS) algorithm, a linear finite impulse response (FIR) filter 10012 for estimating an echo signal x′(t) based on the reference sound signal x(t) through several-hundred-order inner product computation, and a subtracting unit 10013 for removing the echo signal x′(t) from the observation sound signal y(t) to obtain and output a suppression result r(t) with a reduced echo. The detecting unit 10010 detects the single-talk state and the double-talk state based on intensity change in the suppression result r(t). On the basis of resultant double-talk state, the detecting unit 10010 prompts the filter factor updating unit 10011 to stop updating a filter coefficient. The filter factor updating unit 10011 calculates a filter factor (coefficient) based on the suppression result r(t).
The echo suppressor 1000 shown in FIG. 12 includes the suppressing mechanisms 1001 of FIG. 14 for each channel corresponding to the reference sound signals x1(t), . . . , and xn(t). The echo suppressing method described above is shown, for example, in Japanese Laid-open Patent Publication No. 2002-237769.
However, the adaptation processing based on the NLMS as shown in FIG. 14 suppresses an echo in accordance with past learning results, which causes a problem of a low capability of following a large change of an observation signal at the shift between the single-talk state and the double-talk state. This leads to another problem of erroneous voice recognition that would occur by detecting a state just after a speaker starts speaking as the single-talk state or detecting a state involving an echo only as the double-talk state.
Further, the method using suppressing mechanisms corresponding to each channel as shown in FIG. 12 has a problem of increasing a cost and an apparatus size. In particular, in the case of applying this method to a car navigation system having rigid constraints on its installation space, the problem of increasing the size would become serious.
Further, as shown in FIG. 13, in the case of using an added reference sound signal of a monophonic channel, which is obtained by adding reference sound signals, a problem of increasing a residual error that remains to be suppressed occurs. This is because, in an output unit of the multi-channel audio unit 2000, which outputs sounds of music etc., reproduced sounds from each speaker and intensities thereof independently change, so echoes in plural paths will be difficult to be learned and estimated through one adaptation processing.