Communication devices typically include a crystal oscillator to provide a clock signal for digital integrated circuits. The crystal oscillator may provide a clock signal at an approximately fixed fundamental frequency. For communication devices to transmit and receive information with one another, it is desirable to synchronize the clock signal between the transmitting and receiving devices. For example, the frequency of the clock signal between any two communication devices may be slightly different. When a clock of one communication device does not run at exactly the same speed compared to another clock, clock drift, or frequency offset, occurs. Clock drift may cause a mismatch between a number of samples generated by a transmitting device and a number of samples received by a receiving device.
Clock drift is often a problem in audio communication systems. For example, an audio communication system configured to obtain audio data of user utterances may include both a loudspeaker and a microphone. The loudspeaker may be used to play audio signals, including speech from a remote source during a telephone call, audio content presented from local storage or streamed from a network, as well as audio content from a video being watched. The audio signal played by the loudspeaker may be generated by an electronic device that includes a crystal oscillator providing a clock signal. In addition, the microphone may be used to capture audio signals from a local source, such as a user speaking voice commands or other utterances. For example, a user may speak a command to the microphone to control playback during a movie. The microphone may include a separate crystal oscillator providing a clock signal that is not exactly the same as the clock frequency of the crystal oscillator controlling the loudspeaker. As a result, during a given time interval, a number of audio samples played through the loudspeaker may not exactly match the number of audio samples received by the microphone during the same time interval. For example, frequency offset may cause nonlinear time-varying disturbances of the effective echo path, including from a D/A converter, a loudspeaker room microphone impulse response, and a A/D converter. The different sampling frequency in the microphone and loudspeaker path may cause a drift of the effective echo path, and therefore may cause jumps of the effective impulse response, which may deteriorate the performance of the adaptive filter.
Propagation delay is another problem in audio communication devices. Propagation delay results from a time delay between when a loudspeaker device sends an audio output signal to the loudspeaker for playback and when an audio input signal is received by a microphone device. When the microphone is in a constant position relative to the loudspeaker, the propagation delay remains constant. However, if the microphone is moved, the propagation delay may change.
The change in propagation delay can cause a mismatch between the number of audio samples played through the loudspeaker and the number of audio samples received by the microphone during a given time interval. Thus, clock drift and propagation delay are two problems in an audio communication system that can cause an offset in the number of audio samples received by a microphone compared to the number of samples played through a loudspeaker. In addition, if propagation delay becomes bigger than a delay path defined/allocated in the system to synchronize microphone output samples with a received reference signal, then the system may become non causal, and, as a result, the adaptive filter may diverge.
An audio communication system configured to obtain audio data of user utterances frequently performs acoustic echo cancellation to improve speech recognition. For example, an acoustic echo occurs when a signal emitted by the loudspeaker is captured by the microphone after undergoing reflections in the local environment. An acoustic echo canceller (“AEC”) may be used to remove acoustic echo from an audio signal captured by a microphone in order to facilitate improved communication. For example, the AEC may filter the microphone signal by determining an estimate of the acoustic echo (e.g., the remote audio signal emitted from the loudspeaker and reflected in the local environment). The AEC can then subtract the estimate from the microphone signal to produce an approximation of the true local signal (e.g., the user's utterance). The estimate can be obtained by applying a transformation to a reference signal that corresponds to the remote signal emitted from the loudspeaker. In addition, the transformation can be implemented using an adaptive algorithm. For example, adaptive transformation relies on a feedback loop, which continuously adjusts a set of coefficients that are used to calculate the estimated echo from the far-end signal. Different environments produce different acoustic echoes from the same loudspeaker signal, and any change in the local environment may change the way that echoes are produced. By using a feedback loop to continuously adjust the coefficients, an AEC to can adapt its echo estimates to the local environment in which it operates.
The offset caused by clock drift and propagation delay can severely inhibit the effectiveness of the acoustic echo canceller. For example, if the offset is not corrected, speech recognition may be severely diminished. If the offset is great enough, speech recognition may not even be possible. Accordingly, there is a need to measure and correct clock drift and propagation delay in audio communication systems.