The present invention relates to audio signal processing and, in particular, to an apparatus and a method for extracting a direct/ambience signal from a downmix signal and spatial parametric information. Further embodiments of the present invention relate to a utilization of direct-/ambience separation for enhancing binaural reproduction of audio signals. Yet further embodiments relate to binaural reproduction of multi-channel sound, where multi-channel audio means audio having two or more channels. Typical audio content having multi-channel sound is movie soundtracks and multi-channel music recordings.
The human spatial hearing system tends to process the sound roughly in two parts. These are on the one hand, a localizable or direct and, on the other hand, an unlocalizable or ambient part. There are many audio processing applications, such as binaural sound reproduction and multi-channel upmixing, where it is desirable to have access to these two audio components.
In the art, methods of direct/ambience separation as described in “Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement”, Goodwin, Jot, IEEE Intl. Conf. On Acoustics, Speech and Signal proc, April 2007; “Correlation-based ambience extraction from stereo recordings”, Merimaa, Goodwin, Jot, AES 123rd Convention, New York, 2007; “Multiple-loudspeaker playback of stereo signals”, C. Faller, Journal of the AES, October 2007; “Primary-ambient decomposition of stereo audio signals using a complex similarity index”; Goodwin et al., Pub. No: US2009/0198356 A1, August 2009; “Patent application title: Method to Generate Multi-Channel Audio Signal from Stereo Signals”, Inventors: Christof Faller, Agents: FISH & RICHARDSON P. C., Assignees: LG ELECTRONICS, INC., Origin: MINNEAPOLIS, Minn. US, IPC8 Class: AH04R500FI, USPC Class: 381 1; and “Ambience generation for stereo signals”, Avendano et al., Date Issued: Jul. 28, 2009, Application: Ser. No. 10/163,158, Filed: Jun. 4, 2002 are known, which may be used for various applications. The state-of-art direct-ambience separation algorithms are based on inter-channel signal comparison of stereo sound in frequency bands.
Moreover, in “Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding”, Goodwin, Jot, AES 123rd Convention, New York 2007, binaural playback with ambience extraction is addressed. Ambience extraction in connection to binaural reproduction is also mentioned in J. Usher and J. Benesty, “Enhancement of spatial sound quality: a new reverberation-extraction audio upmixer,” IEEE Trans. Audio, Speech, Language Processing, vol. 15, pp. 2141-2150, September 2007. The latter paper focuses on ambience extraction in stereo microphone recordings, using adaptive least-mean-square cross-channel filtering of the direct component in each channel. Spatial audio codecs, e.g. MPEG surround, typically consist of a one or two channel audio stream in combination with spatial side information, which extends the audio into multiple channels, as described in ISO/IEC 23003-1—MPEG Surround; and Breebaart, J., Herre, J., Villemoes, L., Jin, C., Kjörling, K., Plogsties, J., Koppens, J. (2006). “Multi-channel goes mobile: MPEG Surround binaural rendering”. Proc. 29th AES conference, Seoul, Korea.
However, modern parametric audio coding technologies, such as MPEG-surround (MPS) and parametric stereo (PS) only provide a reduced number of audio downmix channels—in some cases only one—along with additional spatial side information. The comparison between the “original” input channels is then only possible after first decoding the sound into the intended output format.
Therefore, a concept for extracting a direct signal portion or an ambient signal portion from a downmix signal and spatial parametric information is needed. However, there are no existing solutions to the direct/ambience extraction using the parametric side information.