Consider an acoustic or electric signal that is known at a reference point and that travels in finite time through a system to a reception point. The delay is the time elapsed between the appearance of, say, a characteristic signal feature at the reference point and its reappearance at the reception point. In the concrete case of the system being a communications network, the delay may be the sum of the propagation time in various conducting materials, processing time in decoders and encoders, particularly conversions between analog and digital format, waiting time during routing in packet-switched networks, and possibly propagation time in air if the signal is transmitted in acoustic form in some segment of the path.
Accurate estimates of the delay of an acoustic or electric signal travelling through a system are valuable in a number of applications, for instance, in echo cancellation and echo suppression in communications equipment, de-reverberation, echo localization, alignment of audio files for the purpose of fingerprinting, and the alignment of audio signals in recording studios.
An important application of delay estimation is echo suppression and/or echo cancellation as used in telephony. In this context, a far-end party and a near-end party communicate using a telecommunications network. The near-end party would like to receive only the speech signal spoken by the far-end speaker. However, because of acoustic echo or because of network echo, the near-end party may additionally receive the (distorted and delayed) speech signal spoken by him or herself. This signal component in the signal travelling towards the near-end party is referred to as the echo signal. In the case of acoustic echo, the feedback path is acoustic and consists of acoustic speakers at the far-end and microphones that acquire the echo signal. In the case of network echo, the feedback path is electronic and results from imperfect transmission-line terminations.
To reduce the echo perceived by the near-end party, the echo signal must be decreased or eliminated. This is normally done by digital computing means using adaptive filtering (echo cancellation) and/or by gain control (echo suppression). The adaptive filter of echo cancellation is optimized to remove the signal component that correlates with the signal travelling towards the far-end party by subtracting it from the signal travelling towards the near-end party. Finding the relative delay (the bulk delay) of the signal travelling to the far end and the echo signal is implicit in this optimization and is generally based on correlation, albeit sometimes in a broad sense. An initial estimate of the bulk delay is commonly used to reduce the required number of the correlation computations for echo cancellation. In the case of echo suppression, heuristic rules are generally used to suppress the signal travelling towards the near end whenever it mostly consists of the signal spoken by the near-end party. Echo suppression requires knowledge of the relative delay between the signal travelling to the far end and the echo signal. An estimate of the relative delay is usually computed by means of cross correlation.
Available delay estimation methods for echo cancellation are generally directly or indirectly based on cross correlation. However, the cross-correlation operation has drawbacks. A first drawback is that the cross correlation method has high computational complexity for long sequences when a large search range of possible delays is used. A second drawback is that the performance of cross-correlation based methods generally deteriorates when the relation between the echo signal and the signal travelling to the far-end cannot be described accurately by a linear filtering operation. That is, performance is reduced when the feedback path introduces nonlinear distortions. A third drawback applies to systems with time-varying delay where it is difficult to balance previously determined (old) information and new information about the delay. Usage of a long but finite evaluation interval imposes large storage and computational requirements. Alternatively, an implicit exponential decay of older data must be used, such as by iterative multiplication by a factor less than unity, which generally performs less well.
The disadvantages of high computational effort, sensitivity to non-linear distortions in the feedback path, and the difficulty in removing old information motivate alternative delay estimation methods. Alternative delay estimation methods can be used to reduce the search range in echo cancellation and as a first or final estimate of the delay for echo suppression.