In an apparatus equipped with a speaker and a microphone, such as a videoconference system, a cellular phone, or a car navigation system having a voice output/recognition function, an echo suppressing process for removing echo derived from sound output from the speaker from sound received by the microphone has been realized.
FIG. 1 is a block diagram illustrating the structure of a conventional echo suppression device. In FIG. 1, a reference numeral 10000 denotes an echo suppression device for executing the echo suppressing process. The echo suppression device 10000 includes a speaker 10001 for outputting sound on the basis of a reference sound signal y(t) and a microphone 10002 for converting input sound into an observation sound signal x(t). The echo suppression device 10000 also includes an adaptive filter 10003 used for removing echo from the observation sound signal x(t).
The microphone 10002 receives not only voice of a speaking person but also various sounds including sound output from the speaker 10001 and other noises. In other words, sound output from the speaker 10001 on the basis of the reference sound signal y(t) is input to the microphone 10002 through a sound field of the external environment. When impulse response between the speaker 10001 and the microphone 10002 is expressed as h(t), the echo suppression device 10000 obtains an estimated value a(t) of the impulse response h(t) by using the adaptive filter 10003, so as to derive a signal y′(t) obtained by allowing the reference sound signal y(t) to pass through the adaptive filter 10003. As a method for estimating the estimated value a(t), the method of steepest descent, the LMS (Least Mean Square) method, the learning identification method or the like is employed. Then, the thus obtained signal y′(t) is subtracted from the observation sound signal x(t), so as to remove echo derived from the output of the speaker 10001, resulting in deriving a differential signal e(t). Incidentally, when a signal included in the observation sound signal x(t) is merely residual echo derived from the output of the speaker 10001, the estimation is performed by operating the adaptive filter 10003 so as to minimize the power of the differential signal e(t) (see, for example, “Sound and Acoustic Technology”, first edition, by Sadaoki Furui, published by Kindai Kagaku sha Co., Ltd., September 1992, pp. 84-85). Thereafter, various processing such as voice recognition is executed on the basis of the differential signal e(t).
Furthermore, a noise suppressing process for estimating an arrival direction of sound by utilizing a plurality of microphones and suppressing, as ambient noise, sound arriving from directions other than a target direction such as a direction of a speaking person has been realized.
FIG. 2 is a block diagram illustrating the structure of a conventional noise suppression device. In FIG. 2, a reference numeral 20000 denotes a noise suppression device for suppressing ambient noise on the basis of arrival directions. The noise suppression device 20000 includes a first microphone 20001 and a second microphone 20002 disposed at an appropriate distance d, and the first microphone 20001 and the second microphone 20002 respectively output a first sound signal x1(t) and a second sound signal x2(t) on the basis of sound input thereto. The noise suppression device 20000 further includes a first FIR (Finite Impulse Response) filter 20003 for filtering the first sound signal x1(t) on the basis of a first filter factor H1(ω) set therein and a second FIR filter 20004 for filtering the second sound signal x2(t) on the basis of a second filter factor H2(ω) set therein. The noise suppression device 20000 further includes a filter factor derivation unit 20005 for respectively deriving the first filter factor H1(ω) of the first FIR filter 20003 and the second filter factor H2(ω) of the second FIR filter 20004 on the basis of the first sound signal x1(t) and the second sound signal x2(t) and for outputting the derived first filter factor H1(ω) and second filter factor H2(ω) respectively to the first FIR filter 20003 and the second FIR filter 20004. Moreover, the noise suppression device 20000 includes an adder 20006 for outputting a sound signal r(t) obtained by summing up a first sound signal x1′(t) and a second sound signal x2′(t) resulting from the filtering respectively by the first FIR filter 20003 and the second FIR filter 20004.
In FIG. 2, assuming that an ambient noise source is sufficiently away from the first microphone 20001 and the second microphone 20002 and that ambient noise to be suppressed arrives from a direction θ as plane waves, the noise is first received by the first microphone 20001 and then received by the second microphone 20002 late by delay time τ(=d sin θ/c, c: acoustic velocity). Accordingly, when the first filter factor H1(ω) having a transfer factor with the delay time τ and in an opposite phase is set in the first FIR filter 20003 and the second filter factor H2(ω) having a transfer factor of 1 is set in the second FIR filter 20004, the sound signal r(t) becomes a signal in which the ambient noise arriving from the direction θ is suppressed. Through application of this technique, an arrival direction of sound of each frequency may be estimated so as to suppress, as ambient noise, sound arriving from directions other than a target direction such as a direction of a speaking person (see, for example, “Sound and Acoustic Technology”, first edition, by Sadaoki Furui, published by Kindai Kagaku sha Co., Ltd., September 1992, pp. 85-86). Thereafter, various processing such as voice recognition is executed on the basis of the sound signal r(t).
Although the echo suppression device of FIG. 1 may suppress echo on the basis of a reference sound signal, it has a problem that ambient noise other than the echo may not be suppressed. Although the noise suppression device of FIG. 2 may suppress ambient noise other than sound arriving from a specific target direction such as a direction of a speaking person, if the speaker is disposed in the vicinity of the speaking person, sound output from the speaker may not be sufficiently suppressed. Accordingly, a sound processor capable of complementing the problems of the echo suppression device for suppressing echo and the noise suppression device for suppressing ambient noise in accordance with an arrival direction by combining these techniques has begun to be examined.
FIG. 3 is a block diagram illustrating the structure of a conventional sound processor. In FIG. 3, a reference numeral 30000 denotes a sound processor obtained by combining an echo suppression device for suppressing echo and a noise suppression device for suppressing ambient noise in accordance with an arrival direction. The sound processor 30000 includes a speaker 30001 for outputting sound on the basis of a reference sound signal y(t), and a first microphone 30002 and a second microphone 30003 for converting input sound respectively into a first observation sound signal x1(t) and a second observation sound signal x2(t).
The sound processor 30000 includes a noise suppression unit 30004 for suppressing noise in the first observation sound signal x1(t) and the second observation sound signal x2(t) on the basis of an arrival direction, and the noise suppression unit 30004 outputs an observation sound signal x_r(t) with ambient noise suppressed. Furthermore, the sound processor 30000 includes an echo suppression unit 30005 for suppressing echo based on the reference sound signal y(t) in the observation sound signal x_r(t) with ambient noise suppressed, and the echo suppression unit 30005 outputs a signal e_r(t) with ambient noise and echo suppressed. Thereafter, various processing such as voice recognition is executed on the basis of the signal e_r(t).
The sound processor of FIG. 3 employs the structure in which echo is suppressed after suppressing ambient noise in accordance with an arrival direction.
FIG. 4 is a block diagram illustrating the structure of another conventional sound processor. In FIG. 4, a reference numeral 40000 denotes a sound processor obtained by combining an echo suppression device for suppressing echo and a noise suppression device for suppressing ambient noise in accordance with an arrival direction. The sound processor 40000 includes a speaker 40001 for outputting sound on the basis of a reference sound signal y(t), and a first microphone 40002 and a second microphone 40003 for converting input sound respectively into observation sound signals x1(t) and x2(t).
The sound processor 40000 includes a first echo suppression unit 40004 for suppressing echo based on the reference sound signal y(t) in the first observation sound signal x1(t), and the first echo suppression unit 40004 outputs a first observation sound signal e1(t) with echo suppressed. Also, the sound processor 40000 includes a second echo suppression unit 40005 for suppressing echo based on the reference sound signal y(t) in the second observation sound signal x2(t), and the second echo suppression unit 40005 outputs a second observation sound signal e2(t) with echo suppressed. Furthermore, the sound processor 40000 includes a noise suppression unit 40006 for suppressing ambient noise on the basis of an arrival direction in the first observation sound signal e1(t) and the second observation sound signal e2(t) in which the echo has been suppressed, and the noise suppression unit 40006 outputs a signal e_r(t) with ambient noise and echo suppressed. Thereafter, various processing such as voice recognition is executed on the basis of the signal e_r(t).
The sound processor of FIG. 4 employs the structure in which ambient noise is suppressed on the basis of an arrival direction after suppressing echo.