Audio processing becomes more and more important. In perceptual processing of spatial audio, a typical assumption is that the spatial aspect of a loudspeaker-reproduced sound is determined especially by the energies and the time-aligned dependencies between the audio channels in perceptual frequency bands. This is founded on the notion that these characteristics, when reproduced over loudspeakers, transfer into inter-aural level differences, inter-aural time differences and inter-aural coherences, which are the binaural cues of spatial perception. From this concept, various spatial processing methods have emerged, including upmixing, see
[1] C. Faller, “Multiple-Loudspeaker Playback of Stereo Signals”, Journal of the Audio Engineering Society, Vol. 54, No. 11, pp. 1051-1064, June 2006,
spatial microphony, see, for example,
[2] V. Pulkki, “Spatial Sound Reproduction with Directional Audio Coding”, Journal of the Audio Engineering Society, Vol. 55, No. 6, pp. 503-516, June 2007; and
[3] C. Tournery, C. Faller, F. Küch, J. Herre, “Converting Stereo Microphone Signals Directly to MPEG Surround”, 128th AES Convention, May 2010;
and efficient stereo and multichannel transmission, see, for example,
[4] J. Breebaart, S. van de Par, A. Kohlrausch and E. Schuijers, “Parametric Coding of Stereo Audio”, EURASIP Journal on Applied Signal Processing, Vol. 2005, No. 9, pp. 1305-1322, 2005; and
[5] J. Herre, K. Kjörling, J. Breebaart, C. Faller, S. Disch, H. Purnhagen, J. Koppens, J. Hilpert, J. Rödén, W. Oomen, K. Linzmeier and K. S. Chong, “MPEG Surround—The ISO/MPEG Standard for Efficient and Compatible Multichannel Audio Coding”, Journal of the Audio Engineering Society, Vol. 56, No. 11, pp. 932-955, November 2008.
Listening tests have confirmed the benefit of the concept in each application, see, for example, [1, 4, 5] and, for example,
[6] J. Vilkamo, V. Pulkki, “Directional Audio Coding: Virtual Microphone-Based Synthesis and Subjective Evaluation”, Journal of the Audio Engineering Society, Vol. 57, No. 9, pp. 709-724, September 2009.
All these technologies, although different in application, have the same core task, which is to generate from a set of input channels a set of output channels with defined energies and dependencies as function of time and frequency, which may be assumed to be the common underlying task in perceptual spatial audio processing. For example, in the context of Directional Audio Coding (DirAC) see, for example, [2], the source channels are typically first order microphone signals, which are by means of mixing, amplitude panning and decorrelation processed to perceptually approximate a measured sound field. In upmixing (see [1]), the stereo input channels are, again, as function of time and frequency, distributed adaptively to a surround setup.