The present invention relates to the field of audio processing, especially to the field of parametric spatial audio processing and for converting a first parametric spatial audio signal into a second parametric spatial audio signal.
Spatial sound recording aims at capturing a sound field with multiple microphones such that at the reproduction side, a listener perceives the sound image, as it was present at the recording location. Standard approaches for spatial sound recording use simple stereo microphones or more sophisticated combinations of directional microphones, e.g., such as the B-format microphones used in Ambisonics and described by M. A. Gerzon, “Periphony: Width-Height Sound Reproduction,” J. Aud. Eng. Soc., Vol. 21, No. 1, pp 2-10, 1973, in the following referred to as [Ambisonics]. Commonly, these methods are referred to as coincident-microphone techniques.
Alternatively, methods based on a parametric representation of sound fields can be applied, which are referred to as parametric spatial audio coders. These methods determine a downmix audio signal together with corresponding spatial side information, which are relevant for the perception of spatial sound. Examples are Directional Audio Coding (DirAC), as discussed in Pulkki, V., “Directional audio coding in spatial sound reproduction and stereo upmixing,” in Proceedings of The AES 28th International Conference, pp. 251-258, Piteå, Sweden, Jun. 30-Jul. 2, 2006, in the following referred to as [DirAC], or the so-called spatial audio microphones (SAM) approach proposed in Faller, C., “Microphone Front-Ends for Spatial Audio Coders”, in Proceedings of the AES 125th International Convention, San Francisco, October 2008, in the following referred to as [SAM]. The spatial cue information basically consists of the direction-of-arrival (DOA) of sound and the diffuseness of the sound field in frequency subbands. In a synthesis stage, the desired loudspeaker signals for reproduction are determined based on the downmix signal and the parametric side information.
In other words, the downmix signals and the corresponding spatial side information represent the audio scene according to the set-up, e.g. the orientation and/or position of the microphones, in relation to the different audio sources used at the time the audio scene was recorded.