Audio capturing hardware and auxiliary software for recording audio signals have become integral parts of modern computers. This hardware is generally inexpensive and allows the recording of 16 bit mono/stereo signals with a sampling rate up to 44 kHz. Such a high sampling rate is generally not necessary for speech signals where for most applications a sampling rate 11 kHz is sufficient. On the other hand, the quality of the recording is often a problem due to environmental noise, interfering signals, room reverberation, other speakers, etc. Multi-microphone technologies such as beamforming and adaptive beamforming allow significant enhancement in the quality of the recorded signal improving the signal to noise ratio, canceling the interfering signals or environmental noise, reducing reverberation, tracking the speaker movements, etc.
Today, it is possible to perform stereo recording without using special or highly separated microphones, which is important for video conferencing applications. Noise cancellation is probably a classical application of multi-microphone technologies. Additional noise cancellation means produce an output signal clear from any signal but the signal of interest. It is known from the theory of beamforming that, by using N spatially separated omnidirectional microphones, it is theoretically possible to clean the signal of interest from up to N-1 interfering signals if they come from directions different from the direction of interest. One application of noise cancellation is to use it as a front end for a speech recognition system. Voice communication in a noisy environment is another application where the user may greatly benefit from improved signal-to-noise ratio of the transmitted signals.
Since the microphones are separated in space and sound sources are located in specific places (at least temporarily), the microphones record different signals. In the simplest (ideal) case, the signals are just delayed versions of each other. In more complex cases they may be filtered versions of each other or even contain independent information. In any case, these differences have spatial origin and hence may be exploited to extract spatial information about recorded sounds.
Unfortunately, use and development of these advanced technologies is hampered by the lack of hardware allowing simultaneous recording of multi-channel signals into a computer memory. Furthermore, many other applications might immediately emerge as soon as an inexpensive technology for recording multi-channel or multi-microphone signals become available.