Speech signals received from a device such as a microphone or handset are subjected to speech encoding or a speech recognition process. Background noise signals mixed with these speech signals pose a serious problem when implementing speech encoding or speech recognition in a narrow-band speech encoder having a high degree of information compression, a speech recognition device, or the like. Two-input noise cancellers employing adaptive filters are disclosed in References [1] to [9] and [23] as signal processors directed toward eliminating thus acoustically superposed noise components.
A two-input noise canceller operates by using an adaptive filter, which approximates the impulse response of the path (noise path) which is traveled by a noise signal that has been applied as input to the reference input terminal until the signal reaches the speech input terminal, to generate a pseudo noise signal corresponding to the noise signal component which is mixed at the speech input terminal, and then, by subtracting this pseudo noise signal from the received sound signal which is received as input at the speech input terminal, suppresses the noise signal. The received sound signal is a signal in which a speech signal and noise signal are mixed, this received sound signal typically being a signal that is applied as input to the speech input terminal from a microphone or handset. At this time, the filter coefficient of an adaptive filter is corrected by taking the relation between an error signal obtained by subtracting the pseudo noise signal from the received sound signal and the reference signal applied as input to the reference input terminal.
Known coefficient correction algorithms for this adaptive filter include the “LMS algorithm (Least-Mean-Square Algorithm)” described in Reference [23] and the “LIM (Learning Identification Method)” described in Reference [24].
FIG. 1 shows a representative configuration of a two-input noise canceller of the prior art. This noise canceller is provided with: two input terminals 101, 102, adaptive filter 107, subtractor 111, and output terminal 113.
A signal which has undergone acousto-electrical conversion by a microphone placed close to the speaker is applied to input terminal 101. Signal XP(k) that has been applied as input is a signal in which background noise signal n(k) is mixed with the speech signal S(k) which is the object signal and can be represented by Equation (1):Xp(k)=S(k)+n(k)  (1)
A signal that has undergone acousto-electrical conversion by a microphone placed at a position which is farther from the speaker than for input terminal 101 is applied to input terminal 102. If the microphone connected to input terminal 102 is in a position that is sufficiently remote from the speaker and sufficiently close to the source of noise, signal Xr(k) applied as input to input terminal 102 is equivalent to background noise signal N(k) applied as input to input terminal 101, resulting in Equation (2):Xr(k)=N(k)  (2)
Adaptive filter 107 performs a filtering operation taking as input signal Xr(k) applied to input terminal 102 and supplies pseudo noise signal R(k) as the operation result.
Subtractor 111 subtracts pseudo noise signal R(k) which is supplied by adaptive filter 107 from signal XP(k) which is applied to input terminal 101 to generate differential signal e(k), and both transfers differential signal e(k) to output terminal 113 as the output signal of the noise canceller and supplies the differential signal e(k) to adaptive filter 107 as the error signal for updating the coefficient of adaptive filter 107. Differential signal e(k) is given by the following Equation (3):e(k)=S(k)+n(k)−R(k)  (3)
Based on the error signal received as input, adaptive filter 107 uses the coefficient correction algorithm to update the coefficient of the filter. Assuming here that the LMS algorithm described in Reference [23] is employed as the coefficient update algorithm of the adaptive filter and taking wj(k) as the jth coefficient of adaptive filter 107 at time k, the pseudo noise signal R(k) which is supplied as output by adaptive filter 107 is represented by Equation (4):
                              R          ⁡                      (            k            )                          =                              ∑                          j              =              0                                      N              -              1                                ⁢                                                    W                j                            ⁡                              (                k                )                                      ·                                          X                r                            ⁡                              (                                  k                  -                  j                                )                                                                        (        4        )            
Here, N indicates the number of taps of adaptive filter 107. The updating of the coefficient is realized according to Equation (5):wj(k+1)=wj(k)+α·e(k)·Xr(k−j)  (5)
In this case, α is a constant referred to as the “step size” and is a parameter for determining the convergence time of the coefficient and the residual error after the convergence. When step size α is large, the amount of correction of the coefficient increases and the convergence is therefore fast, but fluctuation of the coefficient also increases in the vicinity of the optimum value and the final residual error becomes great. In contrast, when the step size α is small, the time required for convergence increases, but the final residual error becomes small.
As shown in Equation (3), error signal e(k) contains speech signal S(k), and because the coefficient update operation is carried out such that e(k)=0, the coefficient update operation is not carried out such that R(k)=n(k) when S(k)≠0. As a result, speech signal S(k) has a large influence as a disturbance signal for the coefficient update operation of adaptive filter 107. To reduce the influence of speech signal S(k), step size α must be set to an extremely small value. However, as stated above, decreasing the step size raises the problem of increase in the convergence time of adaptive filter 107.
As a method that takes this problem into consideration, References [10]-[19] and [25] disclose noise cancellers which implement control of the step size.
FIG. 2 shows the configuration of an adaptive noise canceller of the prior art for implementing control of step size which is described in Reference [25]. As shown in FIG. 2, this prior-art device is provided with two adaptive filters 5, 7 and uses the signal-to-noise (S/N) ratio at input terminal 1 which is estimated by using adaptive filter 5 to control the step size of adaptive filter 7. Implementing control such that the step size is small when the speech signal is greater than the noise signal and the step size is large in the opposite state enables a shortening of the convergence time of adaptive filter 7 and a decrease of the distortion in the signal following noise cancellation that is transferred to output terminal 13. This noise canceller is further provided with: two delay circuits 3, 4; two subtractors 9, 11; step size control circuit 19; and S/N estimation circuit 21.
The operation of adaptive filter 5 is equivalent to the operation of adaptive filter 107 in the device shown in the previously described FIG. 1. Accordingly, the estimated value of the speech signal component from which the influence of noise in input terminal 1 has been eliminated and the estimated value of the noise signal component at input terminal 1 are supplied to S/N ratio estimation circuit 21. This input is realized because the input of S/N ratio estimation circuit 21 is the output of subtractor 9 which approximates the speech component at input terminal 1 and the output of adaptive filter 5 which approximates the noise component. The S/N ratio estimation circuit is also referred to as a signal-to-noise power relation estimation circuit.
In S/N ratio estimation circuit 21, the estimated value of the signal-to-noise ratio is found by using the estimated value of the speech signal component and the estimated value of the noise signal component which are supplied thereto. The signal-to-noise ratio found in S/N ratio estimation circuit 21 is supplied to step size control circuit 19, and the thus-obtained step size is supplied to adaptive filter 7.
In contrast to adaptive filter 107 of FIG. 1, the input signal to adaptive filter 7 is the signal supplied to input terminal 2 delayed by delay circuit 4. Similarly, the signal supplied from input terminal 1 to subtractor 11, in contrast to subtractor 111 of the circuit shown in FIG. 1, is a signal that has been delayed by delay circuit 3. Delay circuits 3, 4 produce a delay of the same time interval, and are configured such that the noise cancellation realized by adaptive filter 7 is applied to signals realized by delaying the signals supplied to input terminals 1, 2 by the same time interval. The delay time which is produced by delay circuit 3 and the delay time which is produced by delay circuit 4 are set to a time interval that is at least the delay time resulting from the calculation of estimated values by S/N ratio estimation circuit 21. Subtractor 11 subtracts noise similar to subtractor 111 of the device shown in FIG. 1 and transfers the output to output terminal 13.
The configuration of S/N ratio estimation circuit 21 can be represented as shown in FIG. 3. S/N ratio estimation circuit 21 is composed of averaging circuits 14, 15 and operation circuit 16. Averaging circuit 14 is supplied with the estimated value of the speech signal component, calculates the average value of the estimated value of the speech signal component, and delivers the average value of the estimated speech signal. Similarly, averaging circuit 15 is supplied with the estimated value of the noise signal component, calculates the average value of the noise signal component, and delivers the average value of the estimated noise signal. The outputs of averaging circuits 14, 15 are both supplied to operation circuit 16. Operation circuit 16 uses the average value of the estimated speech signal component and the average value of the estimated noise signal component which are supplied from averaging circuits 14, 15 to find the estimated value of the average signal-to-noise ratio and supplies this value as the first signal-to-noise ratio.
Averaging circuits 14, 15 calculate the average power E(k) from time k−L to time k. If Y(k) is the input signal, the average power E(k) is given by Equation (6):
                              E          ⁡                      (            k            )                          =                              1            L                    ⁢                                    ∑                              i                =                0                            L                        ⁢                                          Y                2                            ⁡                              (                                  k                  -                  i                                )                                                                        (        6        )            
Equation (7) may also be used in place of Equation (6):E(k)=γ·E(k−1)+(1−γ)·Y2(k)  (7)where γ is a constant that satisfies the relation 0<γ<1.
Step size control circuit 19 supplies adaptive filter 7 with the step size that has been calculated based on the first signal-to-noise ratio that has been found by S/N ratio estimation circuit 21.
If the first signal-to-noise ratio at time k is SNR1(k), step size control circuit 19 receives SNR1(k) as input and calculates step size α1(k).
α1(k) is found as the value of function f1(x) that implements monotone decrease at SNR1min<SNR1(k)<SNR1max. In this case, SNR1min and SNR1max are constants satisfying the relation SNR1min<SNR1max.
This relation can be represented by Equations (8a) to (8c):α1(k)=αmax (SNR1(k)<SNR1min)  (8a)α1(k)=f1(SNR1(k))(SNR1min≦SNR1(k)≦SNR1max)  (8b)α1(k)=α1min (SNR1(k)>SNR1max)  (8c)In addition, α1min and α1max are constants satisfying the relation α1min<α1max.
The monotone decrease function f1(x) can be represented by, for example, Equations (9a) to (9c).f1(x)=−A·x+B  (9a)A=(α1max−α1min)/(SNR1max−SNR1min)  (9b)B={α1max+α1min+A·(SNR1max+SNR1min)}/2  (9c)
According to the noise canceller described in Reference [25], second adaptive filter 5 can be used to estimate the signal-to-noise ratio at the speech input terminal and thus control the step size of first adaptive filter 7 such that the step size is small when the signal-to-noise ratio is large and the step size is large in the reverse situation. As a result, operation is enabled that mitigates the influence of the disturbance signal.
Nevertheless, when the signal supplied to input terminal 2 has not been collected at a position which is sufficiently remote from the speaker, signal XR(k) which is applied as input to input terminal 2 is a signal in which speech signal s(k) is mixed with background noise signal N(k) as shown in Equation (10), and a component having a correlation with speech signal s(k) appears in the output of the adaptive filter. As a result, not only does distortion occur in the signal which is transferred to output terminal 13, but error also occurs in the speech signal component which is supplied to S/N ratio estimation circuit 21.XR(k)=N(k)+s(k)  (10)
As noise cancellers that take this problem into consideration, References [20], [21] and [26] disclose noise cancellers which employ adaptive filters approximating the impulse response of the path taken by a speech signal until the signal reaches the reference input terminal, FIG. 4 shows the configuration of the noise canceller described in Reference [26].
The noise canceller shown in FIG. 4 is a device in which adaptive filters 6, 8, subtractors 10, 12, step size control circuit 20, and S/N ratio estimation circuit 22 have been added to the noise canceller shown in FIG. 2. In this noise canceller, a signal which corresponds to the speech signal that leaks to input terminal 2 is generated by adaptive filter 8, and the result of subtracting the output of adaptive filter 8 from the signal which is supplied to input terminal 2 is supplied to adaptive filter 7, where the disturbance of the speech signal which leaks to input terminal 2 is reduced. Adaptive filter 6 and S/N ratio estimation circuit 22 controls the step size of adaptive filter 8 in accordance with the same principle of the noise canceller shown in FIG. 2. In the device shown in FIG. 4, in contrast with the device shown in FIG. 2, the input signal to adaptive filter 7 is an estimated value of the noise signal component from which the influence of speech has been removed. This result is obtained because the input of adaptive filter 7 is the output of subtractor 12 which approximates the noise component at input terminal 2. Similarly, in contrast with the device shown in FIG. 2, the input signal of adaptive filter 5 is the output of subtractor 10 in the device shown in FIG. 4.
The input signal of adaptive filter 8 is an estimated value of the speech signal component from which the influence of noise has been removed. This result is obtained because the input of adaptive filter 8 is the output of subtractor 11 which approximates the speech component at input terminal 1. Adaptive filter 8 performs a filtering operation on the signal supplied from subtractor 11 and supplies first pseudo speech signal as the operation result. Similarly, the input signal of adaptive filter 6 is the output of subtractor 9. Adaptive filter 6 performs a filtering operation on the signal supplied from subtractor 9 and delivers a second pseudo speech signal as the operation result.
Subtractor 12 subtracts the output of adaptive filter 8 from the output of delay circuit 4 and both supplies the result of subtraction to adaptive filter 7 and transmits the result to adaptive filter 8 as an error signal for updating the coefficient. Subtractor 10 subtracts the output of adaptive filter 6 from the signal supplied to input terminal 2 and both supplies the result of subtraction to adaptive filter 5 and transmits the result to adaptive filter 6 as an error signal for updating the coefficient.
An estimated value of the noise signal component from which the influence of speech at input terminal 2 has been removed and an estimated value of the speech signal component at input terminal 2 are supplied to S/N ratio estimation circuit 22. This is because the input of S/N ratio estimation circuit 22 is the output of subtractor 10 which approximates the noise component at input terminal 2 and the output of adaptive filter 6 which approximates the speech component. The configuration of S/N ratio estimation circuit 22 is equivalent to the configuration of S/N ratio estimation circuit 21 which was explained using FIG. 3. Accordingly, S/N ratio estimation circuit 22 uses the estimated value of the speech signal component and the estimated value of the noise signal component that have been supplied to find the estimated value of the signal-to-noise ratio, and supplies the result to step size control circuit 20 as the second signal-to-noise ratio.
Step size control circuit 20 supplies adaptive filter 8 with a step size that has been calculated based on the second signal-to-noise ratio that has been found in S/N ratio estimation circuit 22.
If the estimated value of the second signal-to-noise ratio at time k is SNR2(k), step size control circuit 20 receives SNR2(k) and calculates step size α2(k).
α2(k) is found as the value of function f2(x) that causes monotone increase at SNR2min<SNR2(k)<SNR2max. In this case, SNR2min and SNR2max are constants satisfying the relation SNR2min<SNR2max. This relation can be represented by Equations (11a) to (11c).α2(k)=α2min (SNR2(k)<SNR2min)  (11a)α2(k)=f2(SNR2(k))(SNR2min≦SNR2(k)≦SNR2max)  (11b)α2(k)=α2max (SNR2(k)>SNR2max)  (11c)α2min and α2max are constants satisfying the relation α2min<α2max.
The monotone increase function f2(x) can be represented by, for example, Equations (12a) to (12c).f2(x)=C·x+D  (12a)C=(α2max−α2min)/(SNR2max−SNR2min)  (12b)D={α2max+α2min−C·(SNR2max+SNR2min)}/2  (12c)
According to the noise canceller described in Reference [26], adaptive filter 8 is used to estimate the speech signal that leaks to the reference input terminal and this estimated value is subtracted in a subtractor to enable a reduction of the disturbance due to the speech signal that leaks to input terminal 2. In addition, the use of adaptive filter 6 to estimate the signal-to-noise ratio at the reference input terminal enables control of the step size of adaptive filter 8 such that a large step size is used when the signal-to-noise ratio is large and a small step size is used in the reverse situation to enable operation that reduces the influence of the disturbance signal.
The reference documents cited in this description are listed below:    [1] JP-A-H09-36763    [2] JP-A-H08-56180    [3] JP-A-H06-284491    [4] JP-A-H06-90493    [5] JP-A-H09-181653    [6] JP-A-H05-75391    [7] JP-A-H05-158494    [8] JP-A-H05-22788    [9] JP-A-S61-194914    [10] JP-A-2000-4494    [11] JP-A-2000-172299    [12] JP-A-H11-27099    [13] JP-A-H11-345000    [14] JP-A-H10-3298    [15] JP-A-H10-215193    [16] JP-A-H09-18291    [17] JP-A-H08-241086    [18] JP-A-S62-135019    [19] JP-A-S61-194913    [20] JP-A-H10-215194    [21] JP-A-H08-110794    [22] JP-A-H11-502324    [23] Bernard Widrow et. al, “Adaptive Noise Canceling: Principles and Applications,” PROCEEDINGS OF IEEE, VOL. 63, NO. 12, 1975, pp. 1692-1716    [24] Jin-ichi Nagumo and Atsuhiko Noda, “A Learning Method for System Identification,” IEEE Transactions on Automatic Control, VOL. 12, NO. 3, 1967, pp. 282-287    [25] Shigeji Ikeda and Akihiiko Sugiyama, “An Adaptive Noise Canceller with Low Signal Distortion for Speech Codec,” IEEE TRANSACTIONS ON SIGNAL PROCESS/NG, VOL. 47, NO. 3, 1999, pp. 665-674    [26] Shigeji Ikeda and Akihiko Sugiyama, “An Adaptive Noise Canceller with Low Signal Distortion in the Presence of Crosstalk,” IEICE TRANSACTIONS ON FUNDAMENTALS, VOL. E82-A, NO. 8, 1999, pp. 1517-1525    [27] David G. Messerschmitt, “Echo Cancellation in Speech and Data Transmission,” IEEE Journal on Selected Areas in Communications, VOL. SAC-2, NO. 2, 1984, pp. 283-297    [28] John J. Shynk, “Frequency-Domain and Multirate Adaptive Filtering,” IEEE Signal Processing Magazine, VOL. 9, NO. 1, pp. 14-37, 1992