Embodiments according to the invention are related to an apparatus for providing a set of spatial cues associated with an upmix audio signal having more than two channels on the basis of a two-channel microphone signal. Further embodiments according to the invention are related to a corresponding method and to a corresponding computer program. Further embodiments according to the invention are related to an apparatus for providing a processed or unprocessed two-channel audio signal and a set of spatial cues.
Another embodiment according to the invention is related to a microphone front end for spatial audio coders.
In the following, an introduction will be given into the field of parametric representation of audio signals.
Parametric representation of stereo and surround audio signals has been developed over the last few decades and has reached a mature status. Intensity stereo (R. Waal and R. Veldhuis, “Subband coding of stereophonic digital audio signals,” Proc. IEEE ICASSP 1991, pp. 3601-3604, 1991.), (J. Herre, K. Brandenburg, and D. Lederer, “Intensity stereo coding,” 96th AES Conv., February 1994, Amsterdam (preprint 3799), 1994.) is used in MP3 (ISO/IEC, Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s—Part 3: Audio. ISO/IEC 11172-3 International Standard, 1993, jTC1/SC29/WG11.), MPEG-2 AAC (______, Generic coding of moving pictures and associated audio information—Part 7: Advanced Audio Coding. ISO/IEC 13818-7 International Standard, 1997, jTC1/SC29/WG11.), and other audio coders. Intensity stereo is the original parametric stereo coding technique, representing stereo signals by means of a downmix and level difference information. Binaural Cue Coding (BCC) (C. Faller and F. Baumgarte, “Efficient representation of spatial audio using perceptual parametrization,” in Proc. IEEE Workshop on Appl. Of Sig. Proc. to Audio and Acoust., October 2001, pp. 199-202.), (______, “Binaural Cue Coding—Part II: Schemes and applications,” IEEE Trans. on Speech and Audio Proc., vol. 11, no. 6, pp. 520-531, November 2003.) has enabled significant improvement of audio quality by means of using a different filterbank for parametric stereo/surround coding than for audio coding (F. Baumgarte and C. Faller, “Why Binaural Cue Coding is better than Intensity Stereo Coding,” in Preprint 112th Conv. Aud. Eng. Soc., May 2002.), i.e. it can be viewed as a pre- and post-processor to a conventional audio coder. Further, it uses additional spatial cues for the parametrization than only level differences, i.e. also time differences and inter-channel coherence. Parametric Stereo (PS) (E. Schuijers, J. Breebaart, H. Purnhagen, and J. Engdegard, “Low complexity parametric stereo coding,” in Preprint 117th Conv. Aud. Eng. Soc., May 2004.), which is standardized in IEC/ISO MPEG, uses phase differences as opposed to time differences, which has the advantage that artifact free synthesis is easier achieved than for time delay synthesis. The described parametric stereo concepts were also applied to surround sound by BCC. The MP3 Surround (J. Herre, C. Faller, C. Ertel, J. Hilpert, A. Hoelzer, and C. Spenger, “MP3 Surround: Efficient and compatible coding of multi-channel audio,” in Preprint 116th Conv. Aud. Eng. Soc., May 2004.), (C. Faller, “Coding of spatial audio compatible with different playback formats,” in Preprint 117th Conv. Aud. Eng. Soc., October 2004.), and MPEG Surround (J. Herre, K. Kjörling, J. Breebaart, C. Faller, S. Disch, H. Purnhagen, J. Koppens, J. Hilpert, J. Rödèn, W. Oomen, K. Linzmeier, and K. S. Chong, “Mpeg surround—the iso/mpeg standard for efficient and compatible multi-channel audio coding,” in Preprint 122th Conv. Aud. Eng. Soc., May 2007.) audio coders introduced spatial synthesis based on a stereo downmix, enabling stereo backwards compatibility and higher audio quality. A parametric multi-channel audio coder, such as BCC, MP3 Surround, and MPEG Surround, is often referred to as Spatial Audio Coder (SAC).
Recently a technique was proposed denoted spatial impulse response rendering (SIRR) (J. Merimaa and V. Pulkki, “Spatial impulse response rendering i: Analysis and synthesis,” J. Aud. Eng. Soc., vol. 53, no. 12, 2005.), (V. Pulkki and J. Merimaa, “Spatial impulse response rendering ii: Reproduction of diffuse sound and listening tests,” J. Aud. Eng. Soc., vol. 54, no. 1, 2006.), which synthesizes impulse responses in any direction (relative to the microphone position) based on a single audio channel (W-signal of Bformat (M. A. Gerzon, “Periphony: Width-Height Sound Reproduction,” J. Aud. Eng. Soc., vol. 21, no. 1, pp. 2-10, 1973.), (K. Farrar, “Soundfield microphone,” Wireless World, pp. 48-50, October 1979.) plus spatial information obtained from the B-format signals. This technique was later also applied to audio signals as opposed to impulse responses and called directional audio coding (DirAC) (V. Pulkki and C. Faller, “Directional audio coding: Filterbank and STFTbased design,” in Preprint 120th Conv. Aud. Eng. Soc., May 2006, p. preprint 6658.) DirAC can be viewed as a SAC, which is applicable directly to microphone signals. Various microphone configurations have been proposed for use with DirAC (J. Ahonen, G. D. Galdo, M. Kallinger, F. Mich, V. Pulkki, and R. Schultz-Amling, “Analysis and adjustment of planar microphone arrays for application in directional audio coding,” in Preprint 124th Conv. Aud. Eng. Soc., May 2008.), (J. Ahonen, M. Kallinger, F. Mich, V. Pulkki, and R. Schultz-Amling, “Directional analysis of sound field with linear microphone array and applications in sound reproduction,” in Preprint 124th Conv. Aud. Eng. Soc., May 2008.). DirAC is based on Bformat signals and the signals of the various microphone configurations are processed to obtain B-format, which then is used in the directional analysis of DirAC.
In view of the above, it is the objective of the present invention to create a computationally efficient concept for obtaining a spatial cue information, while keeping the effort for the sound transduction reasonably small.