Traditional spatial sound recording aims at capturing a sound field with multiple microphones such that at the reproduction side, a listener perceives the sound image as it was at the recording location. Standard approaches for spatial sound recording usually use spaced, omnidirectional microphones, for example, in AB stereophony, or coincident directional microphones, for example, in intensity stereophony, or more sophisticated microphones, such as a B-format microphone, e.g. in Ambisonics, see, for example,    [1] R. K. Furness, “Ambisonics—An overview,” in AES 8th International Conference, April 1990, pp. 181-189.
For the sound reproduction, these non-parametric approaches derive the desired audio playback signals (e.g., the signals to be sent to the loudspeakers) directly from the recorded microphone signals.
Alternatively, methods based on a parametric representation of sound fields can be applied, which are referred to as parametric spatial audio coders. These methods often employ microphone arrays to determine one or more audio downmix signals together with spatial side information describing the spatial sound. Examples are Directional Audio Coding (DirAC) or the so-called spatial audio microphones (SAM) approach. More details on DirAC can be found in    [2] Pulkki, V., “Directional audio coding in spatial sound reproduction and stereo upmixing,” in Proceedings of the AES 28th International Conference, pp. 251-258, Piteå, Sweden, Jun. 30-Jul. 2, 2006,    [3] V. Pulkki, “Spatial sound reproduction with directional audio coding,” J. Audio Eng. Soc., vol. 55, no. 6, pp. 503-516, June 2007.
For more details on the spatial audio microphones approach, reference is made to    [4] C. Faller: “Microphone Front-Ends for Spatial Audio Coders”, in Proceedings of the AES 125th International Convention, San Francisco, October 2008.
In DirAC, for instance the spatial cue information comprises the direction-of-arrival (DOA) of sound and the diffuseness of the sound field computed in a time-frequency domain. For the sound reproduction, the audio playback signals can be derived based on the parametric description. In some applications, spatial sound acquisition aims at capturing an entire sound scene. In other applications spatial sound acquisition only aims at capturing certain desired components. Close talking microphones are often used for recording individual sound sources with high signal-to-noise ratio (SNR) and low reverberation, while more distant configurations such as XY stereophony represent a way for capturing the spatial image of an entire sound scene. More flexibility in terms of directivity can be achieved with beamforming, where a microphone array can be used to realize steerable pick-up patterns. Even more flexibility is provided by the above-mentioned methods, such as directional audio coding (DirAC) (see [2], [3]) in which it is possible to realize spatial filters with arbitrary pick-up patterns, as described in    [5] M. Kallinger, H. Ochsenfeld, G. Del Galdo, F. Küch, D. Mahne, R. Schultz-Amling. and O. Thiergart, “A spatial filtering approach for directional audio coding,” in Audio Engineering Society Convention 126, Munich, Germany, May 2009,as well as other signal processing manipulations of the sound scene, see, for example,    [6] R. Schultz-Amling, F. Küch, O. Thiergart, and M. Kallinger, “Acoustical zooming based on a parametric sound field representation,” in Audio Engineering Society Convention 128, London UK, May 2010,    [7] J. Herre, C. Falch, D. Mahne, G. Del Galdo, M. Kallinger, and O. Thiergart, “Interactive teleconferencing combining spatial audio object coding and DirAC technology,” in Audio Engineering Society Convention 128, London UK, May 2010.
All the above-mentioned concepts have in common that the microphones are arranged in a fixed known geometry. The spacing between microphones is as small as possible for coincident microphonics, whereas it is normally a few centimeters for the other methods. In the following, we refer to any apparatus for the recording of spatial sound capable of retrieving direction of arrival of sound (e.g. a combination of directional microphones or a microphone array, etc.) as a spatial microphone.
Moreover, all the above-mentioned methods have in common that they are limited to a representation of the sound field with respect to only one point, namely the measurement location. Thus, the microphones that may be used may be placed at very specific, carefully selected positions, e.g. close to the sources or such that the spatial image can be captured optimally.
In many applications however, this is not feasible and therefore it would be beneficial to place several microphones further away from the sound sources and still be able to capture the sound as desired.
There exist several field reconstruction methods for estimating the sound field in a point in space other than where it was measured. One method is acoustic holography, as described in    [8] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography, Academic Press, 1999.
Acoustic holography allows to compute the sound field at any point with an arbitrary volume given that the sound pressure and particle velocity is known on its entire surface. Therefore, when the volume is large, the number of sensors that may be used is unpractically large. Moreover, the method assumes that no sound sources are present inside the volume, making the algorithm unfeasible for our needs. The related wave field extrapolation (see also [8]) aims at extrapolating the known sound field on the surface of a volume to outer regions. The extrapolation accuracy however degrades rapidly for larger extrapolation distances as well as for extrapolations towards directions orthogonal to the direction of propagation of the sound, see    [9] A. Kuntz and R. Rabenstein. “Limitations in the extrapolation of wave fields from circular measurements,” in 15th European Signal Processing Conference (EUSIPCO 2007), 2007.    [10] A. Walther and C. Faller, “Linear simulation of spaced microphone arrays using b-format recordings,” in Audio Engineering Society Convention 128, London UK, May 2010,describes a plane wave model, wherein the field extrapolation is possible only in points far from the actual sound sources, e.g., close to the measurement point.
A major drawback of traditional approaches is that the spatial image recorded is relative to the spatial microphone used. In many applications, it is not possible or feasible to place a spatial microphone in the desired position, e.g., close to the sound sources. In this case, it would be more beneficial to place multiple spatial microphones further away from the sound scene and still be able to capture the sound as desired.    [11] US61/287,596: An Apparatus and a Method for Converting a First Parametric Spatial Audio Signal into a Second Parametric Spatial Audio Signal,proposes a method for virtually moving the real recording position to another position when reproduced over loudspeakers or headphones. However, this approach is limited to a simple sound scene in which all sound objects are assumed to have equal distance to the real spatial microphone used for the recording. Furthermore, the method can only take advantage of one spatial microphone.