Ambisonics is a technology that describes an audio scene in terms of sound pressure, and addresses the recording, production, transmission and playback of complex audio scenes with superior spatial resolution, both in 2D and 3D. In Ambisonics, a spatial audio scene is described by coefficients Anm(k) of a Fourier-Bessel series. Microphone arrays that provide 1st order Ambisonics signals as so-called B-format signals are known. However, decoding and rendering 1st order Ambisonics signals to speaker arrangements for 2D surround or 3D only offers a limited cognition of sound directivity. Sound sources are often perceived to be broader than they actually are. Especially for off-center listening positions, the sound sources are often located as coming from the closest speaker positions, instead of their intended virtual position between speakers. The 1st order Ambisonics (B-format) signals are composed out of four coefficients of a Fourier-Bessel series description of the sound pressure, which form a 3D sound field representation. These are the W channel (mono mix, or 0th order) and the X,Y,Z channels (1st order). Higher order signals use more coefficients, which increases the accuracy of spatial source localization when the coefficients are decoded to speaker signals. However, such higher order signals are not included in B-format signals provided by microphone arrays.
Directional Audio Coding (DirAC) is a known technique [5,9] for representing or reproducing audio signals. It uses a B-format decoder that separates direct sound from diffuse sound, then uses Vector-Based Amplitude Panning (VBAP) for selective amplification of the direct sound in the frequency domain, and after synthesis filtering finally provides speaker signals at its output.
FIG. 1 a) shows the structure of DirAC-based B-format decoding. The B-Format signals 10 are time domain signals, and are filtered in an analysis filter bank AFBD into K frequency bands 11. A sound field analysis block SFAD estimates a diffuseness estimate Ψ(fk) 13 and directions-of-arrival (DoA) 12. The DoA are the azimuth φ(fk) and inclination Θ(fk) of the directions to the source at a particular mid frequency of a band k. A 1st order Ambisonics decoder AmbD renders the Ambisonics signals to L speaker signals 14. A direct-diffuse separation block DDS separates the 1st order Ambisonics signals into L direct sound signals 15 and L diffuse sound signals 16, using a filter that is determined from the diffuseness estimate 13. The L diffuse sound signals 16 are derived by multiplying the output 14 of the decoder AmbD with √{square root over (Ψ(fk))}, which is obtained from the diffuseness estimate 13. The directional signals are derived from multiplication with √{square root over (1−Ψ(fk))}. The direct sound signals 15 are further processed using a technique called Vector Base Amplitude Panning (VBAP) [8]. In a VBAP unit VP, a gain value for each speaker signal (in each frequency band) is multiplied to pan the direct sound to the desired directions, according to the DoA 12 and the positions of the speakers. The diffuse signals 16 are de-correlated by de-correlation filtering DF, and the de-correlated diffuse signals 17 are added to the direct sound signals being obtained from the VPAB unit VP. A synthesis filter bank SFBD combines the frequency bands to a time domain signal 19, which can be reproduced by L speakers. Smoothing filters (not shown in FIG. 1) for temporal integration are applied to calculate the diffuseness estimate Ψ(f) 13 and to smooth the gain values that were derived by VBAP.
FIG. 1 b) shows details of the sound field analysis block SFAD. The B-format signals represent a sound field in the frequency domain at the origin (observation position, r=0). The sound intensity describes the transport of kinetic and potential energy in a sound field. In the sound field, not all local movement of sound energy corresponds to a net transport. Active intensity Ia (time averaged acoustic intensity, DoA˜Ia) is the rate of directive net energy transport−energy per unit time for the three Cartesian directions. The active intensity 11a of the B-format signal 11 is obtained in an active intensity analysis block AIAD, and provided to a diffuseness analysis block DABD and a DoA analysis block DOAABD, which output the DoA 12 and the diffuseness estimate 13, respectively. More about DirAC is described in [9], the underlying theory in [5].