The present invention is in the field of spatial sound recording and reproduction. Spatial sound recording aims at capturing a sound field with multiple microphones such that at the reproduction side, a listener perceives the sound image as it was at the recording location. Standard approaches for spatial sound recording usually use spaced omnidirectional microphones (e.g. in AB stereophony), or coincident directional microphones (e.g. in intensity stereophony). The recorded signals can be reproduced from a standard stereo loudspeaker setup to achieve a stereo sound image. For surround sound reproduction, for example, using a 5.1 loudspeaker setup, similar recording techniques can be used, for example, five cardioid microphones directed towards the loudspeaker positions [ArrayDesign]. Recently, 3D sound reproduction systems have emerged, such as the 7.1+4 loudspeaker setup, where 4 height speakers are used to reproduce elevated sounds. The signals for such a loudspeaker setup can be recorded for example with very specific spaced 3D microphone setups [MicSetup3D]. All these recordings techniques have in common that they are designed for a specific loudspeaker setup, which limits the practical applicability, for example, when the recorded sound should be reproduced on different loudspeaker configurations.
More flexibility is achieved when not directly recording the signals for a specific loudspeaker setup, but instead recording the signals of an intermediate format, from which the signals of an arbitrary loudspeaker setup can then be generated on the reproduction side. Such an intermediate format, which is well-established in practice, is represented by (higher-order) Ambisonics [Ambisonics]. From an Ambisonics signal, one can generate the signals of every desired loudspeaker setup including binaural signals for headphone reproduction. This involves a specific renderer which is applied to the Ambisonics signal, such as a classical Ambisonics renderer [Ambisonics], Directional Audio Coding (DirAC) [DirAC], or HARPEX [HARPEX].
An Ambisonics signal represents a multi-channel signal where each channel (referred to as Ambisonics component) is equivalent to the coefficient of a so-called spatial basis function. With a weighted sum of these spatial basis functions (with the weights corresponding to the coefficients) one can recreate the original sound field in the recording location [FourierAcoust]. Therefore, the spatial basis function coefficients (i.e., the Ambisonics components) represent a compact description of the sound field in the recording location. There exist different types of spatial basis functions, for example spherical harmonics (SHs) [FourierAcoust] or cylindrical harmonics (CHs) [FourierAcoust]. CHs can be used when describing the sound field in the 2D space (for example for 2D sound reproduction) whereas SHs can be used to describe the sound field in the 2D and 3D space (for example for 2D and 3D sound reproduction).
The spatial basis functions exist for different orders l, and modes m in case of 3D spatial basis functions (such as SHs). In the latter case, there exist m=2l+1 modes for each order l, where m and l are integers in the range l≥0 and −l≤m≤l. A corresponding example of spatial basis functions is shown in FIG. 1a, which shows spherical harmonic functions for different orders l and modes m. Note that the order l is sometimes referred to as levels, and that the modes m may be also referred to as degrees. As can be seen in FIG. 1a, the spherical harmonic of the zeros order (zeroth level) l=0 represents the omnidirectional sound pressure in the recording location, whereas the spherical harmonics of the first order (first level) l=1 represent dipole components along the three dimensions of the Cartesian coordinate system. This means, a spatial basis function of a specific order (level) describes the directivity of a microphone of order l. In other words, the coefficient of a spatial basis function corresponds to the signal of a microphone of order (level) l and mode m. Note that the spatial basis functions of different orders and modes are mutually orthogonal. This means for example that in a purely diffuse sound field, the coefficients of all spatial basis functions are mutually uncorrelated.
As explained above, each Ambisonics component of an Ambisonics signal corresponds to a spatial basis function coefficient of a specific level (and mode). For example, if the sound field is described up to level l=1 using SHs as spatial basis function, then the Ambisonics signal would comprise four Ambisonics components (since we have one mode for order l=0 plus three modes for order l=1). Ambisonics signals of a maximum order l=1 are referred to as first-order Ambisonics (FOA) in the following, whereas Ambisonics signals of a maximum order l>1 are referred to as higher-order Ambisonics (HOA). When using higher orders l to describe the sound field, the spatial resolution becomes higher, i.e., one can describe or recreate the sound field with higher accuracy. Therefore, one can describe a sound field with only fewer orders leading to a lower accuracy (but less data) or one can use higher orders leading to higher accuracy (and more data).
There exist different but closely related mathematical definitions for the different spatial basis functions. For example, one can compute complex-valued spherical harmonics as well as real-valued spherical harmonics. Moreover, the spherical harmonics may be computed with different normalization terms such as SN3D, N3D, or N2D normalization. The different definitions can be found for example in [Ambix]. Some specific examples will be shown later together with the description of the invention and the embodiments.
The desired Ambisonics signal can be determined from recordings with multiple microphones. The straightforward way of obtaining Ambisonics signals is the direct computation of the Ambisonics components (spatial basis function coefficients) from the microphone signals. This approach involves measuring the sound pressure at very specific positions, for example on a circle or on the surface of a sphere. Afterwards, the spatial basis function coefficients can be computed by integrating over the measured sound pressures, as described for example in [FourierAcoust, p. 218]. This direct approach involves a specific microphone setup, for example, a circular array or a spherical array of omnidirectional microphones. Two typical examples of commercially available microphone setups are the SoundField ST350 microphone or the EigenMike® [EigenMike]. Unfortunately, the requirement of a specific microphone geometry strongly limits the practical applicability, for example when the microphones need to be integrated into a small device or if the microphone array needs to be combined with a video camera.
Moreover, determining the spatial coefficients of higher orders with this direct approach involves a relatively high number of microphones to assure a sufficient robustness against noise. Therefore, the direct approach of obtaining an Ambisonics signal is often very expensive.