This invention relates to a system and method for estimating delay at a communication device.
In telephony, audio signals (e.g. including voice signals) are transmitted between a near-end and a far-end. Far-end signals which are received at the near-end may be outputted from a loudspeaker. A microphone at the near-end may be used to capture a near-end signal to be transmitted to the far-end. An “echo” occurs when at least some of the far-end signal outputted at the near-end is included in the microphone signal which is transmitted back to the far-end. In this sense the echo may be considered to be a reflection of the far-end signal. An example scenario is illustrated in FIG. 1, which shows a signal being captured by a far-end microphone and output by a near-end loudspeaker. The echo is a consequence of acoustic coupling between the loudspeaker and a microphone at the near-end; the near-end microphone captures the signal originating from its own loudspeaker in addition to the voice of the near-end speaker and any near-end background noise. The result is an echo at the far-end loudspeaker.
Echo cancellation is an important feature of telephony. Echo cancellers typically synthesise an estimate of the echo from the far-end voice signal. The estimated echo is then subtracted from the microphone signal.
In Voice over IP (VoIP) communication systems, echoes can be especially noticeable due to the inherent delays introduced by the audio interfaces of VoIP communication devices. Audio interface delays at a communication device lead to a characteristically sparse echo path. An echo path is “sparse” if a large fraction of the energy of the impulse response describing the echo path is concentrated in a small fraction of its duration. To model a sparse echo path, simple echo cancellers require large sample buffers and long filters, but this can lead to reduced performance in terms of convergence speed and depth. In order to improve the performance of echo cancellations in communication systems with a sparse echo path, the delay between the far-end signal and its echo in the microphone signal is estimated and the far-end signal provided to the echo canceller is delayed by that estimated delay. This allows shorter filter and smaller buffers to be used in the echo canceller. The element of a system which performs the delay estimation is typically termed a pure delay estimator.
Various pure delay estimation methods are known in the art. Methods based on cross-correlation techniques have high computational complexity and require a large amount of memory. Methods based on Cepstral analysis and similarity functions are complex and not reliable when it comes to stability of operation.
A method proposed by Dyba et al in “Parallel structures for fast estimation of echo path pure delay and their applications to sparse echo cancellers”, In Proceedings of CISS 2008, March 2008, pp. 241-245 uses a parallel filter bank implementation which suffers from poor performance during double talk conditions and negative ERL cases due to filter divergence.
Bjrn volcker et al in US Patent Publication 2013/0163698 proposes a delay estimator which operates in the frequency domain and uses binary spectrum matching of near-end and far-end history data. This delay estimator tends to overestimate the delay due to the moving averages involved in the long-term estimation it performs.