1. Field of the Invention
The present invention relates to an echo canceler and an echo canceling method and program, and is applicable to an echo canceler and an echo canceling method and program used in a hands-free speakerphone in, for example, a corporate conference room or an ordinary home.
2. Description of the Related Art
With the recent proliferation of voice over Internet protocol (VoIP) telephony and a wide variety of cellular phone services, the telephone rates for many types of telephone sets have come down and people can make telephone calls from their homes and offices with reduced concern about telephone bills. As a result, in many cases the calls last longer. This has led to a sudden rise in the demand for hands-free speakerphones for use in, for example, corporate conference rooms and ordinary homes.
A speakerphone has two effects: one effect is to free the user's hands to do other work during the call, and another effect is to enable long calls to be made without physical distress. Hands-free telephone sets that provide the first effect by employing earphone-microphones, headphones, or headsets are also known.
Because hands-free telephone sets free the hands to do other work during the call, hands-free telephone sets are mounted in automobiles and are used in offices. Hands-free telephone sets of the earphone (earphone-microphone), headphone, and headset type are also a familiar sight in call centers, even though earphones rub against the ear canal and can cause painful inflammation, while headsets and headphones cause irritation and fatigue if worn for a long time.
As long telephone calls made from the home become increasingly common, although the need to be able to drive a car or perform other tasks while making a call remains a significant factor, the need simply to be able to make long calls in comfort has become increasingly important. Differing from a mobile telephone user, a VoIP user expects to be able to relax during a long call and be free of physical as well as financial stress.
The most popular type of hands-free telephone set is the speakerphone, which employs a loudspeaker instead of an earphone or headphones.
An essential part of a speakerphone is an acoustic echo canceler that removes the echo of the acoustic output of the loudspeaker from the signal input through the microphone.
An essential part of an acoustic echo canceler is its adaptive filter, which has tap coefficients that mimic the effect of the acoustic echo path. A key part of the adaptive filter is the algorithm used to update the tap coefficients for optimum echo cancellation.
Many acoustic echo cancelers employ the normalized least mean squares (NLMS) algorithm, described by Haykin in Introduction to Adaptive Filters (Ma{grave over (c)}millan, June 1984, Japanese translation published by Gendaikogakusha, September 1987). The NLMS algorithm has the advantage of excellent stability and a comparatively small computational load on a digital signal processor, which offsets its disadvantages of relatively slow convergence for so-called ‘colored signals’ with a non-flat frequency spectrum. Voice signals are typically colored in this sense.
Although the NMLS algorithm remains an excellent choice for some purposes, the purposes for which speakerphones are needed are diversifying. In the past, speakerphones mainly had to satisfy the demands of hands-free use in corporate conference-room systems, in which system size and cost were not major considerations. Now there is also a need for high-performance speakerphones that are small and inexpensive enough to be used in ordinary homes. In addition, the spread of VoIP has led to the introduction of wideband telephony, which provides better speech quality than conventional telephony, so speakerphones and their adaptive filters must also be able to deal with wideband voice signals.
If the conventional NLMS filter updating algorithm is applied in an echo canceler for wideband telephony, problems arise in relation to both stability and computational load.
These problems arise from the increased sampling rate. If the signal bandwidth is doubled, for example, then the sampling rate also doubles, from the conventional eight thousand samples per second (8 kHz) to sixteen thousand samples per second (16 kHz), so twice as much data must be processed per unit time.
In contrast, the temporal length or impulse response length of the echo path from the speaker to the microphone is determined by physical factors such as the distance between the speaker and the microphone, the speed of sound, the number of different echo paths, and the presence of reflections, and is independent of the sampling rate.
Therefore, if the sampling rate rises, the necessary number of tap coefficients in the adaptive filter (the tap length of the filter) increases and the convergence speed of the adaptive filter is slowed accordingly.
Haykin discloses methods of speeding up the convergence of the NLMS algorithm, but these methods require complex calculations, imposing an increased computational load and requiring increased computational resources, including increased memory capacity. They also compromise the stability of filter. In short, these methods are expensive, requiring extra hardware and software, and they produce unreliable speech quality.
Continued use of the conventional NLMS algorithm for wideband adaptive filtering is also problematic, however. Due to the existing performance problems of this algorithm for colored signals and its decreased convergence speed due to increased tap length, speech quality is degraded by unremoved echo.
In Japanese Patent Application Publication No. 08-237174, Igai discloses a method of overcoming these problems by continuous optimization of the step gain in the NLMS algorithm. A large initial step gain is employed, so that the algorithm starts by converging quickly. As convergence progresses, the step gain is reduced so that the algorithm can model the echo accurately under steady-state conditions.
Continuous optimization of the step gain, however, fails to solve the problem of poor convergence for colored signals, and introduces new problems. For example, if voice input is preceded by a call control tone as described in Telephone Service Interfaces Edition 5, published by the Nippon Telegraph and Telephone Corporation (NTT), and if the algorithm converges while the call control tone is being received, then the reduced step size delays adaptation to the echo characteristics of the voice signal.
In Japanese Patent Application No. 2007-288404, filed by the present applicant, an attempt is made to solve these problems by providing an echo canceler that uses the stable NLMS algorithm to update the tap coefficients, but also converges rapidly.
If the method disclosed in Japanese Patent Application No. 2007-288404 is applied to speakerphones, however, the method disclosed by Tsujikado et al. in Japanese Examined Patent Application Publication No. H08-021881 (formerly Japanese Unexamined Patent Application Publication No. S63-238727, now Japanese Patent No. 2105375) may fail to produce the effects described by Tsujikado et al. when used in the presence of automobile engine noise, street noise, crowd noise, office noise, or other types of ambient noise.