Spatial sound acquisition aims at capturing an entire sound field which is present at a recording room or just certain desired components of the sound field that are of interest for the application at hand. As an example, in a situation where several people in a room have a conversation, it may be of interest to either capture the entire sound field (including its spatial characteristics) or just a signal that a certain talker produces. The latter enables to isolate the sound and apply specific processing to it, such as amplification, filtering etc.
There are a number of methods known for spatially selectively capturing certain sound components. These methods often employ microphones with a high directionality or microphone arrays. Most methods have in common that the microphone or the microphone array is arranged in a fixed known geometry. The spacing between the microphones is as small as possible for coincident microphone techniques, whereas it is normally a few centimeters for the other methods. In the following, we refer to any apparatus for the directionally selective acquisition of the spatial sound (e.g., directional microphones, microphone arrays, etc.) as a beamformer.
Traditionally, directional (spatial) selectivity in sound capture, i.e., a spatially selective sound acquisition, can be achieved in several ways:
One possible way is to employ directional microphones (e.g., cardioid, super cardioid, or shot gun microphones). There, all microphones capture the sound differently depending on the direction-of-arrival (DOA) relative to the microphone. In some microphones, this effect is minor, as they capture sound almost independently of the direction. These microphones are called omnidirectional microphones. Typically in such microphones, a circular diaphragm is attached to a small airtight enclosure, see, for example,    [Ea01] Eargle J. “The Microphone Book” Focal press 2001.
If the diaphragm is not attached to the enclosure and sound reaches it equally from each side, its directional pattern has two lobes of equal magnitude. It captures sound with equal level from both front and back of the diaphragm, however, with inversed polarities. This microphone does not capture sound coming from the directions parallel to the plane of the diaphragm. This directional pattern is called dipole or figure-of-eight. If the enclosure of omnidirectional microphone is not airtight, but a special construction is made, which allows the sound waves to propagate through the enclosure and reach the diaphragm, the directional pattern is somewhere between omnidirectional and dipole (see [Ea01]). The patterns may have two lobes; however, the lobes may have different magnitudes. The patterns may also have a single lobe; the most important example is the cardioid pattern, where the directional function D can be expressed as D=0.5(1+cos(θ)), where θ is the direction of arrival of sound (see [Ea01]). This function quantifies the relative magnitude of the captured sound level of a plane wave at the angle θ with respect to the angle with the highest sensitivity. Omnidirectional microphones are called zeroth-order microphones and other patterns mentioned in the previous, such as dipole and cardioid patterns, are known as first-order patterns. These kinds of microphones do not allow arbitrary shaping of the pattern since their directivity pattern is almost entirely determined by their mechanical construction.
Some special acoustical structures also exist which can be used to create narrower directional patterns to microphones than first-order ones. For example, if a tube which has holes in it is attached to an omnidirectional microphone, a microphone with a very narrow directional pattern can be created. Such microphones are called shotgun or rifle microphones (see [Ea01]). They typically do not have flat frequency responses and their directivity cannot be controlled after recording.
Another method to construct a microphone with directional characteristics is to record sound with an array of omnidirectional or directional microphones and to apply signal processing afterwards, see, for example,    [BW01] M. Brandstein, D. Ward: “Microphone Arrays—Signal Processing Techniques and Applications”, Springer Berlin, 2001, ISBN: 978-3-540-41953-2.
There exist a variety of methods for this. In simplest form, when sound is recorded with two omnidirectional microphones close to each other and subtracted from each other, a virtual microphone signal with a dipole characteristic is formed. See, e.g.    [Elk00] G. W. Elko: “Superdirectional microphone arrays” in S. G. Gay, J. Benesty (eds.): “Acoustic Signal Processing for Telecommunication”, Chapter 10, Kluwer Academic Press, 2000, ISBN: 978-0792378143.
The microphone signals can also be delayed or filtered before summing to each other. In beamforming, a signal corresponding to a narrow beam is formed by filtering each microphone signal with a specially designed filter and then adding them together. This “filter-and-sum beamforming” is explained in    [BS01]: J. Bitzer, K. U. Simmer: “Superdirective microphone arrays” in M. Brandstein, D. Ward (eds.): “Microphone Arrays—Signal Processing Techniques and Applications”, Chapter 2, Springer Berlin, 2001, ISBN: 978-3-540-41953-2.
These techniques are blind to the signal itself, e.g., they are not aware of the direction of arrival of sound. Instead, estimation of the “direction of arrival” (DOA) is a task of its own, see, for example,    [CBH06] J. Chen, J. Benesty, Y. Huang: “Time Delay Estimation in Room Acoustic Environments: An Overview”, EURASIP Journal on Applied Signal Processing, Article ID 26503, Volume 2006 (2006).
In principle, many different directional characteristics can be formed with these techniques. For forming arbitrary spatially very selective sensitivity patterns, however, a large number of microphones may be used. In general, all these techniques rely on distances of adjacent microphones which are small compared to the wavelength of interest.
Another way for realizing directional selectivity in sound capture is parametric spatial filtering. Standard beamformer designs, which may, for example, be based on a limited number of microphones and which possess time-invariant filters in their filter-and-sum structure (see [BS01]) usually exhibit only limited spatial selectivity. To increase the spatial selectivity, recently parametric spatial filtering techniques have been proposed which apply (time-variant) spectral gain functions to the input signal spectrum. The gain functions are designed based on parameters, which are related to the human perception of spatial sound. One spatial filtering approach is presented in    [DiFi2009] M. Kallinger, G. Del Galdo, F. Küch, D. Mahne, and R. Schultz-Amling, “Spatial Filtering using Directional Audio Coding Parameters,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2009,and is implemented in the parameters domain of Directional Audio Coding (DirAC), an efficient spatial coding technique. Directional Audio Coding is described in    [Pu106] Pulkki, V., “Directional audio coding in spatial sound reproduction and stereo upmixing,” in Proceedings of The AES 28th International Conference, pp. 251-258, PiteA, Sweden, Jun. 30-Jul. 2, 2006.
In DirAC, the sound field is analyzed in one location at which the active intensity vector as well as the sound pressure is measured. These physical quantities are used to extract the three DirAC parameters: sound pressure, direction-of-arrival (DOA) and diffuseness of sound. DirAC makes use of the assumption that the human auditory system can only process one direction per time- and frequency-tile. This assumption is also exploited by other spatial audio coding techniques like MPEG Surround, see, for example:    [Vil06] L. Villemoes, J. Herre, J. Breebaart, G. Hotho, S. Disch, H. Purnhagen, and K. Kjörling, “MPEG Surround: The Forthcoming ISO Standard for Spatial Audio Coding,” in AES 28th International Conference, Pitea, Sweden, June 2006.
The spatial filtering approach, as described in [DiFi2009], allows for an almost free choice of spatial selectivity.
A further technique makes use of comparable spatial parameters. This technique is explained in    [Fa108] C. Faller: “Obtaining a Highly Directive Center Channel from Coincident Stereo Microphone Signals”, Proc. 124th AES convention, Amsterdam, The Netherlands, 2008, Preprint 7380.
In contrast to the technique described in [DiFi2009], in which a spectral gain function is applied to an omnidirectional microphone signal, the approach in [Fa108] makes use of two cardioid microphones.
The two mentioned parametric spatial filtering techniques rely on microphone spacings, which are small compared to the wavelength of interest. Ideally, the techniques described in [DiFi2009] and [Fa108] are based on coincident directional microphones.
Another way of realizing directional selectivity in sound capture is a filtering of microphone signals based on the coherence between microphone signals. In    [SBM01] K. U. Simmer, J. Bitzer, and C. Marro: “Post-Filtering Techniques” in M. Brandstein, D. Ward (eds.): “Microphone Arrays—Signal Processing Techniques and Applications”, Chapter 3, Springer Berlin, 2001, ISBN: 978-3-540-41953-2,a family of systems is described, which employ at least two (not necessarily directional) microphones and a processing of their output signal is based on the coherence of the signals. The underlying assumption is that diffuse background noise will appear as incoherent parts in the two microphone signals, whereas a source signal will appear coherently in these signals. Based on this premise, the coherent part is extracted as source signal. Techniques mentioned in [SBM01] were developed due to the fact that filter-and-sum beamformers with a limited number of microphones are hardly capable of reducing diffuse noise signals. No assumptions on the location of the microphones are made; not even the spacing of microphones needs to be known.
A major limitation of traditional approaches for spatially selective sound acquisition is that the recorded sound is invariably related to the location of the beamformer. In many applications it is, however, not possible (or feasible) to place a beamformer in the desired position, e.g., at a desired angle relative to the sound source of interest.
Traditional beamformers, may, for example, employ microphone arrays and can form a directional pattern (“beam”) to capture sound from one direction—and reject sound from other directions. Consequently, there is no possibility to restrict the region of sound capture regarding its distance from the capturing microphone array.
It would be extremely desirable to have a capturing device which can selectively capture sound originating not only from one direction, but directly restricted to originating from one place (spot), similar to the way a close-up spot microphone at the desired place would perform.