a. Field of the Invention
The present invention relates to detection of echo in telecommunications networks. In this context the term echo refers to the problem encountered when someone speaking into a telephone system hears their own speech come back through their handset or speaker after a short delay. Even small amounts of audible echo can be disturbing to a talker and in extreme cases can render natural conversation virtually impossible.
The two most common sources of echo in a telephony system are electrical reflections and acoustical coupling (see FIG. 1 and FIG. 2). The main source of electrical reflections indicted by arrows 8 are two-wire/four wire conversion hybrid circuits 1, 2 used to interconnect two-wire transmission system 3 used between a local exchange 4 and phone 5 at the customer premises with four-wire transmission systems 6,7 used in the core network and telephone handsets. Acoustical coupling indicated by arrow 9 occurs in the remote party's telephone equipment when sound leaks from the earpiece or speaker into the microphone.
Echo is usually analysed using the concept of an echo path, which describes the route taken by a talker's outbound speech to and from the point where it passes into the return path and returns as echo. The echo path can be characterised in terms of its delay, frequency response and echo return loss (ERL). The delay is the time taken for a talker's speech to transit the echo path. The frequency response describes any spectral modification of the signal by the echo path. The ERL is the ratio of the level of the inbound echo with the level of the outbound speech that caused it—the smaller the figure, the louder the echo. Echo generally becomes a problem when the ERL of the echo path is 45 decibels (dB) or less and the path delay exceeds about 40 milliseconds (ms); people cannot generally distinguish echo from their own voice for delays less than 40 ms. The present invention is concerned with detecting the presence of audible echo and determining the delay of the echo path. Measurement of the ERL and frequency response of the echo path is outside of the scope of this invention.
Many telephone calls are made over relatively short distances with a round-trip delay of 40 ms or less, thus rendering any echo inaudible. However, for transcontinental and international connections the signal propagation delay can result in very much longer round-trip delays. Moreover, the electrical echo from hybrid circuits typically results in ERL figures in the order of 20 dB. For these reasons echo control equipment is installed in almost all international and long-haul switching centres, otherwise these routes would suffer from audible echo. Moreover, recent developments in telecommunications have seen the deployment of networks that inherently introduce long transmission delays, for example digital mobile radio typically introduce a round-trip delay of 200 ms and voice over IP systems can introduce delays of 40 ms upwards.
This means that so-called echo cancellers shown schematically in FIG. 3, are now deployed at the interfaces between such systems and the public switched telephone network (PSTN) by default. An echo canceller 20 comprises a model 23 of the expected echo due to an echo path 21. The model 20 generates a signal which is subtracted by a subtractor 22, from an incoming signal to compensate for speech reflected via the echo path 21. Signal 24 represents a talker's speech and signal 25 represents a talker's speech with cancelled echo.
Such cancellers are configured to cancel any network echo from the local part of the PSTN, for example with a delay of less than 128 ms, and it is assumed that if a call is routed to a more distant location there will be cancellers deployed at the far end.
Despite the introduction of echo cancellers, it is not uncommon for telephone systems to introduce audible echo. Typical reasons include an echo path with a round-trip delay that exceeds the capabilities of the closest canceller, misconfiguration of echo cancellation equipment, and the absence of echo cancellers altogether. It is therefore desirable to be detect the presence of un-cancelled echo so that remedial action can be taken.
Acoustic echo is the result of sound leaking from the handset speaker into its microphone. A commonly used measure of this leakage is terminal coupling loss (TCL), which is often calculated using a frequency weighting (TCLw). The TCLw exhibited by a plain old telephony system (POTS) handset tends to fairly good, and should exceed 45 dB. The same should be true of IP phones that are designed to look and feel like a POTS phone. However, mobile handsets tend to have much poorer TCLw figures because they are substantially smaller, and hence the transducers are closer together, and the designers have many more factors to balance against acoustic considerations. Hands-free telephony causes particular problems because sound from the speaker almost inevitably leaks into the microphone. An increasing number of mobile and hands-free terminals therefore have some form of acoustic echo control built-in, but there are still many handsets that do not. One of the problems associated with acoustic echo is that the echo path may rapidly time-varying due to changes in the primary sources of reflection, which in the case of a handset will be due to interactions with the head, and in the case of a hands-free system may be due to movement of people and objects in the vicinity of the terminal.
If an echo path includes a non-linear component, it will no longer be possible to model the echo path as a simple linear filter. The most common example of such a non-linearity is a CELP speech coding algorithm such as GSM EFR or G.729. This means that attempts to cancel, or even simply measure, echo paths that contain speech compression will be problematic because such algorithms do not transmit the waveform entirely faithfully—indeed the signal-to-noise ratio of the combined coding and decoding process is typical only a few dB. The combination of speech coding and transmission errors, such as bit-errors in mobile networks and packet loss in VoIP networks, can lead to even more non-linear behaviour. The existence of such non-linear components is a good reason for locating any echo control as close to the source of echo as possible because this will avoid the problems described above. For example in a mobile handset, acoustic echo cancellation performed in the handset should only see a linear echo path, albeit time-varying. However, echo detection equipment may be located any point in the network and must therefore be capable of detecting echo from non-linear echo paths.
The problem is to design an echo detection method that can be located at any point in the network and reliably detect the presence of time-invariant or time-varying echo whether from an electrical or acoustical source over a wide range of operational conditions including the presence non-linear network elements. It is also desirable to provide an algorithm of lower complexity than known echo detection method.
The present invention in only concerned with detecting the presence of echo and determining the delay of the echo path; determination of the echo path loss and frequency response not envisaged.
b. Related Art
Echo detection may be thought of as locating a degraded search signal within a source signal.
It is known to locate a search signal within a source signal using a correlation between waveforms. It is also known to locate a search signal within a source signal by comparing features extracted from a waveform.
U.S. Pat. No. 6,826,350 “High Speed Signal Search Method and Recording Medium for Same” discloses a method for providing a high speed search method which compares features extracted from respective time waveforms, for example by using a correlation value or Euclidean distance between these features.
U.S. Pat. No. 6,651,041 “Method for executing automatic Evaluation of Transmission Quality of Audio Signals using Source/Received—Signal Spectral Co-variance” discloses calculation of a spectral similarity measure in dependence upon the value of the covariance of the spectra of two signals.
ITU-T Recommendation P.561, “In-service Non-intrusive Measurement Device”, defines minimum performance requirements for such detection apparatus. It also, in Appendix I, describes two methods of detecting echo known to the art: cross-correlation analysis and adaptive filter analysis. The first method uses a simple cross-correlation calculation to detect similarities in time-domain waveforms of the send and receive signals. The adaptive filter analysis uses a similar architecture to an echo canceller to build a model of the echo path, but does not attempt to cancel the echo. However, both of these approaches assume that the echo path is linear and time-invariant—assumptions that as we have seen are not true in the presence of acoustic echo and non-linear echo paths. These two approaches are also highly susceptible to corruption of the echo signal. This means that they do not work reliably in the presence of high levels of acoustic background noise at the far end and require complex voice activity detection algorithms so that analysis can be suppressed during periods when the far party is talking.
The present invention solves these problems by utilising a similarity metric, such as a correlation function, to compare a Fourier transform of the signals in the send and receive directions.