An audio capturing system is a system that captures, transmits, and stores audio signals using one or multiple microphones. An audio capturing system may also support other systems such as speech recognition and speaker identification in order augment their capabilities. A well designed audio capturing system would provide good recording quality even under highly noisy conditions. Also the signal processing unit of this system should be efficient in terms of computational complexity.
For an audio capturing system with multiple microphones, a widely known technique is often referred to as “beamforming” where the time difference between signals due to spatial difference of microphones is used to process, enhance, or filter speech signals. Another useful related technique is the time difference of arrival (TDOA) which calculates directions of audio sources based on the path difference between arriving waves at the microphones from the source. By calculating the directions of audio sources, input speech can be analyzed and interference patterns from sources in undesired directions could be deduced for cancellation.
To analyze speech signals, linear predictive coefficient (LPC) residue could be used in combination with beamforming. LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants would involve inverse filtering, and the remaining signal after the subtraction of the filtered model signal is called the residue. The residue contains important excitation source information which is very useful for TDOA. The residue removes the second order correlation among samples of the signal and produces large amplitude fluctuations around the instants of significant excitation (high signal to noise ratio). LPC residue based TDOA has been known to be more reliable than raw signal TDOA.