In VoIP (Voice over IP) communication (hereafter referred to as VoIP, IP telephony, etc.) sprung into wide use in recent years, an echo component included in an audio signal is a cause of degradation of the speech quality.
This is because VoIP communication has a greater delay than voice communication utilizing a conventional analog line, and this inevitable delay causes the feel of echo to stand out than in the voice communication utilizing the conventional analog line.
Therefore, echo cancellation in VoIP communication has been becoming more important for improving the sound quality. Most VoIP communication apparatuses use an echo canceller to remove the echo component.
The removal of the echo component by the echo-canceller in the conventional VoIP communication will be described with reference to FIG. 2.
As shown in FIG. 2, a conventional echo canceller 13 includes an input terminal Rin 1 for receiving a digital audio signal from the talker at a remote end (hereafter referred to as a far end), an output terminal Rout 2 for giving the digital audio signal from the input terminal Rin 1 to a receiver (hereafter referred to as a near end), an input terminal Sin 7 for receiving a digital audio signal from the near end, an output terminal Sout 9 for supplying the digital audio signal from the input terminal Sin 7 to the far end, an adder 8, a double talk detector 10, and an adaptive filter 14 having a coefficient update section 11 and a filter section 12.
Further, FIG. 2 shows a near-end telephone set 5, a hybrid circuit 4 connected to the telephone set 5, a digital-to-analog converter 3 for converting the digital audio signal from the output terminal Rout 2 to an analog signal and supplying the signal to the hybrid circuit 4, and an analog-to-digital converter 6 for converting the analog signal from the hybrid circuit 4 to a digital audio signal and supplying the signal to the input terminal Sin 7.
In FIG. 2, a broad-band signal (digital audio signal) input to the input terminal Rin 1 is supplied to the output terminal Rout 2 and converted to an analog signal by the digital-to-analog converter 3. The signal then passes the hybrid circuit 4 and reaches the telephone set 5. In this manner, the receiver (near end) can hear the voice from the far end.
In the meantime, a part of the output signal from the output terminal Rout 2 is reflected by the hybrid circuit 4, converted to a digital signal by the analog-to-digital converter 6, and supplied to the input terminal Sin 7. This gives not only the output from the output terminal Sout 9 but also a signal received by the input terminal Sin 7 to the talker at the far end (not shown). Therefore, the talker at the far end hears its own voice as an echo component y, thereby grating on the ear.
On the other hand, the broad-band signal (digital audio signal) input to the input terminal Rin 1 is supplied to the adaptive filter 14. The filter section 12 generates an echo replica (pseudo-echo) signal y′ to cancel out the echo component y and gives the signal y′ to the adder 8.
The adder 8 subtracts the echo component y from the input terminal Sin 7 and the echo replica signal y′ from the filter section 12 to remove the echo component y.
A conventional method of generating the echo replica signal y′ by means of the filter section 12 will next be described. In the subsequent description, a learning identification technique (normalized LMS (NLMS) technique) is used. The known algorithm is one of the most heavily used algorithms for generating the echo replica signal y′.
A signal x input from the input terminal Rin 1 is supplied to the filter section 12. The filter section 12 includes a known FIR (finite impulse response) filter. The tap coefficient h of the adaptive filter 14 (hereafter simply referred to as the coefficient) varies with time.
Next, variations in the tap coefficient h will be described. The m-th tap coefficient of the filter section 12 at time k is denoted by h(k,m), and the input from the input terminal Rin 1 at time k is denoted by x(k). The filter section 12 generates the echo replica signal y′ as given by expression (1).
                              y          ′                =                              ∑                          m              =              0                                                                                  ⁢                              M                -                1                                              ⁢                                    h              ⁡                              (                                  k                  ,                  m                                )                                      ·                          x              ⁡                              (                                  k                  -                  m                                )                                                                        (        1        )            
M denotes a tap length of the filter section 12 and is a constant determined by the designer in consideration of the response length of the echo path. As the tap length increases, a great echo length can be handled, slowing down the convergence of the echo canceller 13. As the tap length decreases, the convergence speeds up, decreasing the echo length that can be handled.
Next, the method of controlling the coefficient of the adaptive filter 14 will be described. The tap coefficient of the filter section 12 is controlled as given by expression (2) and varies with time.
                              h          ⁡                      (                                          k                +                1                            ,              m                        )                          =                              h            ⁡                          (                              k                ,                m                            )                                +                      μ            ⁢                                                            e                  ⁡                                      (                    k                    )                                                  ·                                  y                  ⁡                                      (                    k                    )                                                                                                ∑                                      i                    =                    0                                                        M                    -                    1                                                  ⁢                                                      x                    2                                    ⁡                                      (                                          k                      -                      i                                        )                                                                                                          (        2        )            
The initial value of h and x is zero. In expression (2), μ is a constant which determines a tracking speed of the echo canceller 13 and satisfies the condition of 0≦μ≦1. A great value of μ speeds up convergence, degrading the accuracy of echo cancellation in a steady state. A small value of μ slows down convergence, improving the accuracy of echo cancellation in a steady state.
e(k) is the output of the adder 8. With y(k) and y′(k) representing y and y′ at time k, expression (3) is given as follows:e(k)=y(k)−y′(k)  (3)
Tap coefficient update control performed by using expressions (1), (2), and (3) is a so-called “NLMS technique.” The tap coefficient h(k,m) varies in such a manner that e(k) or the power of e(k) gradually reaches 0. The tap coefficient of the filter section 12 is updated to reduce the echo component y gradually by the adder 8 over time (the adaptive filter 14 converges).
The characteristics of the hybrid circuit 4, which is an echo path, are estimated as the tap coefficient of the filter section 12, and the echo component y is removed accordingly.
In the coefficient update control as described above, if a near-end talker signal s is also input to the input terminal Sin 7, the right-hand side of expression (3) becomes as shown in expression (4), which includes the near-end talker signal s, and the tap coefficient cannot be updated correctly.
Further, s(k) in expression (4) given below denotes a signal input to the input terminal Sin 7 at time k from a source generating great background noise or the voice of the talker at the near end (hereafter referred to as a near-end talker signal).e(k)=y(k)−y′(k)+s(k)  (4)
Accordingly, when the near-end talker signal s(k) is included as in expression (4), the coefficient update must be stopped. Alternatively, the coefficient update is stopped after a predetermined initial convergence period, so that the effect of the signal s(k) is eliminated.
The double talk detector 10 shown in FIG. 2 stops the coefficient update when the near-end talker signal s is included. The double talk detector 10 can perform any detection operation that can stop the coefficient update of expression (4) by detecting a talker signal on the path from the input terminal Sin 7 to the output terminal Sout 9. The operation will not be described here in further detail.
In FIG. 2, the signal e after the adder 8 is used as the input to the double talk detector 10 as before, but the input of the double talk detector 10 can be taken from any place on the sending path or receiving path.
In VoIP communication, the echo canceller performs the operation of echo cancellation as described above.
Some conventional apparatuses and methods of echo cancellation are described in patent documents 1 and 2 and non-patent document 1 given below.
Patent document 1: Japanese Patent Application Kokai (Laid-Open) Publication No. 2003-198434
Patent document 2: Japanese Patent Application Kokai (Laid-Open) Publication No. 2000-115033
Non-patent document 1: ITU-T Recommendation G.722, 7 kHz AUDIO-CODING WITHIN 64 KBIT/S