1. Field of the Invention
The present invention relates generally to an acoustic echo cancellation method and apparatus. In particular, the present invention relates to a method and apparatus for canceling echo signals generated during a call in a mobile terminal of a mobile communication system.
2. Description of the Related Art
In general, the term “acoustic echo” refers to a phenomenon in which a sound wave originated from a sound source is reflected by a surface of an object and returned to the sound source. An example of the acoustic echo, which can be often found in the everyday life, includes a natural echo with single reflection. A direct sound is the opposite of the acoustic echo. The term “direct sound” refers to a directly heard sound without being reflected by the surface of an object. In other words, the acoustic echo indicates a reflected sound that arrives about 0.05 or longer second behind the direct sound in terms of the hearing sense. Therefore, the echo sound and the direct sound are heard with a time difference. In the place with multiple reflecting surfaces, such as a room and a cave, the reflection is repeated several times in various directions, generating a complex echo sound. This is an example of a multiple reflection echo, also known as a reverberation.
The modern society is making a rapid progress to the information society. In the course of progressing toward the information society, communication technology plays a very important role. With the development of communication technology, the communication system is evolving from a wired communication system into a wireless communication system. In order to provide a convenient call environment, there has been proposed a hands-free technique in which a user talks over the phone using a microphone and a speaker instead of the earpiece and mouthpiece. The hands-free technique is applicable to a car hands-free phone, a remote conference system, a speaker-phone system, an International Mobile Telecommunication 2000 (IMT-2000) phone, and so on.
In the communication system where voice communication between the user and the communication device is performed through the speaker and the microphone, it is necessary to take into consideration the fact that a part of the voice or acoustic sound output from the speaker is input to the microphone. Therefore, the acoustic echo component should be taken into account to provide a smooth call. In a full-duplexing hands-free voice communication system, if the acoustic echo component is not appropriately canceled, a far-end user hears back his/her own voice after a lapse of a predetermined time, together with a voice of a near-end user. In other words, the user is inconvenienced by an echo phenomenon during a call.
The acoustic echo occurs because a far-end user's signal output from the speaker is input to the microphone via an acoustic echo path, together with a noise, and then transmitted back to the far-end user. As a result, the far-end user receives the undesired echo signal along with the near-end user's signal. This phenomenon is called a howling phenomenon in communication engineering. An influence of the echo signal increases with intensity and delay time of the echo signal.
The acoustic echo path of the echo signal undergoes a frequent change with the passage of time when a mobile terminal operates not only in a normal voice call mode but also in a video conference mode or a speaker phone mode. For example, the acoustic echo path undergoes a change even when a participant of the conference moves his/her head, arm and shoulder during the video conference.
Therefore, the current mobile terminal uses an acoustic echo canceller (AEC) to cancel the echo phenomenon. The AEC estimates an echo component of a far-end user's signal using an adaptive algorithm, and subtracts the estimated echo component from a signal input to the microphone.
The adaptive algorithm is used because a voice signal, which is the typical input signal of the AEC, has a very high inter-sample correlation and a non-static statistical characteristic. Therefore, the AEC must be implemented using the adaptive algorithm in which filter coefficients undergo a change according to the surrounding environment.
Therefore, the AEC uses an adaptive filtering technique that estimates an echo signal by estimating a time-varying acoustic echo path. The adaptive filtering technique popularly uses a normalized least mean square (NLMS) algorithm for simple structure and stable convergence.
FIG. 1 is a block diagram illustrating a structure of an AEC apparatus and its peripheral circuit included in a mobile terminal. With reference to FIG. 1, a description will now be made of a structure and operation of an AEC apparatus with an NLMS algorithm included in the current mobile terminal.
The peripheral circuit includes a speaker 102 for outputting a received far-end user's signal x(k) 100 and a microphone 103 for converting a near-end user's signal s(k) 130 and a noise signal n(k) 140 into an electrical voice signal. In addition, the microphone 103 receives an output signal y(k) 101 of the speaker 102 for the far-end user's signal x(k) 100, together with the near-end user's signal s(k) 130 and the noise signal n(k) 140.
For simplicity, FIG. 1 illustrates the speaker 102 for receiving the far-end user's signal x(k) 100 decoded by a vocoder 161, the microphone 103 for receiving the near-end user's signal s(k) 130, the background noise signal n(k) 140 and an echo component of the far-end user's signal x(k) 100, and converting the received signals into electrical signals, an AEC 116, an adder 206 for calculating a difference between an output signal of the AEC 116 and an output signal of the microphone 103, and a vocoder 160 for encoding a residual echo signal e(k) 120 output from the adder 206.
The speaker 102, as described above, outputs the received far-end user's signal x(k) 100. The microphone 103 receives the near-end user's signal s(k) 130, the background noise n(k) 140 of the near-end user, and an echo signal y(k) 101 for the far-end user's signal x(k) 100, which is provided through an acoustic echo path from the speaker 102. The microphone 103 converts the received signals into a single electrical digital signal d(k) 104.
The AEC 116 uses an NLMS algorithm-based adaptive filter. The AEC 116 generates an estimated echo signal ŷ(k) 114 from the far-end user's signal x(k) 100, and outputs the estimated echo signal ŷ(k) 114 to the adder 206. The adder 206 calculates a residual echo signal e(k) 120 by subtracting the estimated echo signal ŷ(k) 114 from the electrical digital signal d(k) 104 output from the microphone 103, and outputs the residual echo signal e(k) 120 to the vocoder 160, and also outputs the residual echo signal e(k) 120 to the AEC 116 to control an estimation capability of the adaptive filter.
The adder 206 outputs the residual echo signal e(k) 120 by subtracting the estimated echo signal ŷ(k) 114 output from the AEC 116 from the signal d(k) 104 output from the microphone 103. The signal d(k) 104 output from the microphone 103 can be expressed asd(k)=s(k)+n(k)+y(k)  (1)
The AEC 116 generates the estimated echo signal ŷ(k) 114 by using the far-end user's signal x(k) 100 as a reference input signal in accordance with Equation (2) below.ŷ(k)=XT(k)W(k)  (2)
In Equation (2), XT(k) denotes a transpose matrix of the far-end user's signal x(k) 100, and W(k) denotes a coefficient of the adaptive filter. The AEC 116 which uses the adaptive algorithm must estimate an echo component and adjust the filter coefficient every time such that a difference, or an error, between the estimated echo component ŷ(k) 114 and the actual echo component becomes small.
The adder 206 calculates an average power of the residual echo signal e(k) 120 by subtracting the ŷ(k) 114 calculated using Equation (2) from the d(k) 104 in accordance with Equation (3) below.e(k)=d(k)−ŷ(k)  (3)
Using Equation (4) and Equation (5) below, a new echo component is estimated by calculating a coefficient W(k) of an adaptive filter of the AEC 116 which uses the residual echo signal e(k) 120 calculated by Equation (3).
                              W          ⁡                      (                          k              +              1                        )                          =                              W            ⁡                          (              k              )                                ⁢                                    μ              ⁢                                                          ⁢                              X                ⁡                                  (                  k                  )                                            ⁢                              e                ⁡                                  (                  k                  )                                                                                                                      X                  ⁡                                      (                    k                    )                                                                              2                                                          (        4        )                                          X          ⁡                      (            k            )                          =                  [                                    x              ⁡                              (                k                )                                      ,                          x              ⁡                              (                                  k                  -                  1                                )                                      ,            ⋯            ⁢                                                  ,                          x              ⁡                              (                                  k                  -                  n                                )                                              ]                                    (        5        )            
In Equation (4), W(k+1) denotes an adaptive filter coefficient updated to estimate a new echo component, and is a value determined taking into account the type of the mobile terminal like the slide type and the folder type. In addition, μ denotes an adapt rate of a filter. Equation (5) expresses, as a column matrix, values of the far-end user's signal x(k) 100 with which the adaptive filter estimates a direction signal. In Equation (5), ‘n’ denotes the number of taps of the adaptive filter, which is a length of a path for the echo signal.
In FIG. 1, a vocoder is divided into the vocoder 160 for processing transmission signals and the vocoder 161 for processing reception signals. In practice, however, the vocoder can be implemented with a single chip in the mobile terminal such that it can process both the transmission signals and the reception signals. For convenience, it is shown in FIG. 1 that the vocoder 160 for processing transmission signals and the vocoder 161 for processing reception signals are separated from each other.
The conventional AEC applied to the mobile terminal shows an excellent echo cancellation capability in a normal call with a short acoustic echo path. However, when the mobile terminal operates in the video conference mode or the speaker phone mode, a length of the acoustic echo path is increased. The increase in length of the acoustic echo path increases a length ‘n’ of the adaptive filter. As a result, it can be noted from Equation (2) and Equation (4) that the echo component is calculated by estimating the longer time delay, increasing the total calculations.
The apparatus and method for canceling echo components without dividing one signal according to frequency band has been described so far with reference to FIG. 1. Next, with reference to FIG. 2, a description will now be made of an apparatus and method for canceling acoustic echo by dividing signals input to an AEC into several subbands.
FIG. 2 is a block diagram illustrating a structure of a general AEC apparatus using subband coding, included in a mobile terminal. With reference to FIG. 2, a description will now be made of a structure and operation of the general AEC apparatus. In FIG. 2, a vocoder is divided into a vocoder 161 for outputting a decoded far-end user's signal x(k) 100 and a vocoder 160 for encoding a residual echo signal e(k) 120. In practice, however, the vocoder can be implemented with a single chip in the mobile terminal such that it can process both the transmission signals and the reception signals. For convenience, it is shown in FIG. 2 that the vocoder 160 for processing transmission signals and the vocoder 161 for processing reception signals are separated from each other.
The term “subband coding” refers to a method for coding digital signals using an analysis-by-synthesis (ABS) technique. The subband coding divides an input signal into regular-interval frequency components and performs band division coding thereon. For the band division, polyphase filter banks are used. Each of analysis filter banks 200 and 202 analyzes signals according to frequency band using a filter bank that enables design of a perfect reconstruction (PR) filter for preventing aliasing which may occur in the course of dividing an input signal into several subband signals and converting them into frequency-domain signals.
A synthesis filter bank 212 receives individual subband signals output from the analysis filter bank 200, which analyzes the received far-end user's signal x(k) 100 and divides the analyzed signal into subband signals, and restores the analyzed signals into an original time-domain signal.
A method for canceling acoustic echo using the subband coding method will now be described with reference to FIG. 2.
FIG. 2 illustrates a structure of an AEC apparatus using subband coding and its peripheral circuit included in a mobile terminal. The peripheral circuit includes a speaker 102 for outputting the received far-end user's signal x(k) 100 and a microphone 103 for converting a near-end user's signal s(k) 130 and a noise signal n(k) 140 into an electrical voice signal. In addition, the microphone 103 receives an output signal y(k) 101 of the speaker 102 for the far-end user's signal x(k) 100, together with the near-end user's signal s(k) 130 and the noise signal n(k) 140.
The far-end user's signal x(k) 100 is input to the analysis filter bank 200, and the analysis filter bank 200 converts the far-end users signal x(k) 100 into a frequency-domain signal, divides the frequency-domain signal into regular-interval subband signals {circumflex over (X)}(k) 208, and outputs the subband signals {circumflex over (X)}(k) 208 to an NLMS adaptive filter bank 210.
The adaptive filter bank 210 with an NLMS algorithm receives the subband signals {circumflex over (X)}(k) 208 and outputs estimated echo signals Ŷ(k) 204 generated by estimating echo components according to subband, to adders 206.
As described above, the microphone 103 receives the output signal y(k) 101 of the speaker 102 for the far-end user's signal x(k) 100, provided through an echo path denoted by a dotted line, together with the near-end user's signal s(k) 130 and the noise signal n(k) 140, and converts the received signals into an electrical digital signal d(k) 104.
The analysis filter bank 202 converts the input signal d(k) 104 into frequency-domain signals and analyzes the frequency-domain signals, in order to perform subband coding for dividing the input signal d(k) 104 into regular-interval subbands.
The far-end user's signal x(k) 100 is input to the analysis filter bank 200, without passing through the echo path, and the analysis filter bank 200 analyzes the far-end user's signal x(k) 100 and outputs the analyzed far-end user's signals 208 to the NLMS adaptive filter bank 210. The NLMS adaptive filter bank 210 has different adaptive filter coefficients each having the same filter length, for each individual band. For example, if an adaptive filter with 1024-tap filter coefficients is divided into four bands, the NLMS adaptive filter bank 210 requires 256 taps for a filter coefficient length for each of the bands, and requires 256*4=1024 adaptive filter coefficients. However, the filter bank, as it performs per-frame processing, is less in calculation than the general filtering method that performs per-sample processing.
The estimated echo signals Ŷ(k) 204 output from the adaptive filter bank 210 are input to the adders 206, and the adders 206 calculate differences between subband signals {circumflex over (D)}(k) 105 generated by converting the signal d(k) 104 into frequency-domain signals in the analysis filter bank 202 using Equation (3) and the estimated echo signals Ŷ(k) 204, and output the difference signals to the synthesis filter bank 212.
The synthesis filter bank 212 performs synthesis for restoring frequency-domain signals for each band, to which the adaptive filter algorithm is applied, into time-domain signals.
A residual echo signal e(k) 120 output from the synthesis filter bank 212 is input to the vocoder 160 for encoding. The AEC apparatus using the subband coding receives the digital signal 104 that the microphone 103 generates by converting the far-end user's signal x(k) 100, the echo signal y(k) 101 output from the speaker 102 of the mobile terminal, and the noise signal n(k) 140, converts the received digital signal 104 into frequency-domain signals, and divides the frequency-domain signals into subband signals. The AEC apparatus applies the individual subband signals to the adaptive algorithm, and synthesizes the individual subband signals in the synthesis filter bank 212. As a result, the synthesis filter bank 212 outputs the residual echo signal e(k) 120 to the vocoder 160.
Unlike the general NLMS algorithm that performs per-sample calculation, the AEC apparatus of FIG. 2 performs per-frame calculation which decreases calculations and increases convergence speed, making it possible to efficiently cancel an echo signal having a long echo path.
A conventional subband coding-based AEC apparatus converts digital signals into frequency band signals, divides the frequency band signals according to subband, and applies them to the adaptive filter. Thus, the AEC apparatus includes the adaptive filter bank 210 in which the individual subbands have the same filter coefficient length and require the same amount of calculations.
Herein, a band of voice signals is divided into several subbands, and the subbands lower than or equal to a predefined reference will be referred to as “low bands” while the subbands higher than the predefined reference will be referred to as “high bands.” A definition of the low band part and the high band part is subject to change.
The reference for the low bands and high bands is not an absolute value but a relative value. That is, a 0th subband is a low band with respect to a 1st subband, and the 1st subband is a high band with respect to the 0th subband.
A description will now be made of energy distribution and the amount of information in the high band part and the low band part. A 4-KHz band for satisfying the general Nyquist condition will be established as a full band.
Herein, the full band is divided into four bands, and the amount of information and energy for each individual band is illustrated in FIG. 3 and Table 1. FIG. 3 is a graph illustrating energy distribution for each individual band of a voice signal, given through 4-band subband coding analysis.
Actually, due to the characteristics of the voice signal, most information and energy are distributed over the low band part, and less information and energy are distributed over the high band part.
TABLE 1Band0~10001001~20002001~30003001~4000HzHzHzHzAverage power−34.37 dB−50.59 dB−59.99 dB−62.39 dB(signal)Total power−30.06 dB−44.94 dB−55.45 dB−57.14 dB(signal)
It can be understood from FIG. 3 and Table 1 that the most information and energy of the voice signal are concentrated upon the relatively low bands of a first band of 0˜1000 Hz and a second band of 1001˜2000 Hz.
A high-frequency signal, which is subject to considerable variation, is difficult to estimate using an adaptive filter, compared with a low-frequency signal. Therefore, even implementation of adaptive filtering cannot fully cancel the residual echo signal. It can be noted from FIG. 3 and Table 1 that the first and second bands, which are lower bands, are greater in energy level than the third and fourth bands, which are higher bands.
A conventional Enhanced Variable rate Codec (EVRC) vocoder for a mobile terminal adjusts a bit rate of an input voice signal to one of a full rate, a half rate and an ⅛ rate.
Therefore, if a rate lower than the full rate, that is, the half rate or the ⅛ rate, is allocated, it is inefficient to process signals over the full band. In a mobile terminal environment, a high-capacity memory must be implemented in the mobile terminal to cancel the echo components. Further, because it is difficult to allocate the large amount of calculations to the AEC apparatus, the optimized calculation and memory should be implemented in the mobile terminal.
In the speaker phone mode, the mobile terminal increases in both volume of the speaker and gain of the microphone. As a result, a far-end user's voice output from the speaker is input to the gain-increased microphone directly or after being reflected by the wall or object, and then transmitted back to the far-end user. In this case, the number of paths for the echo signals reflected by the wall or object increases, and in order to decrease the number of the echo paths, the number of adaptive filter taps of the NLMS algorithm must be reduced.
Generally, the AEC is designed to be optimized for a path delay of about 64 ms˜128 ms, and includes an adaptive filter having 512˜1024 taps for a digital signal having a sampling frequency. However, the mobile terminal has difficulty in performing the complex calculations due to limited memory and battery life.