The present invention relates to spatial audio coding, and more particularly to controlling dynamic decoding of binaural audio signals.
In spatial audio coding, a two/multi-channel audio signal is processed such that the audio signals to be reproduced on different audio channels differ from one another, thereby providing the listeners with an impression of a spatial effect around the audio source. The spatial effect can be created by recording the audio directly into suitable formats for multi-channel or binaural reproduction, or the spatial effect can be created artificially in any two/multi-channel audio signal, which is known as spatialization.
It is generally known that for headphones reproduction artificial spatialization can be performed by HRTF (Head Related Transfer Function) filtering, which produces binaural signals for the listener's left and right ear. Sound source signals are filtered with filters derived from the HRTFs corresponding to their direction of origin. A HRTF is the transfer function measured from a sound source in free field to the ear of a human or an artificial head, divided by the transfer function to a microphone replacing the head and placed in the middle of the head. Artificial room effect (e.g. early reflections and/or late reverberation) can be added to the spatialized signals to improve source externalization and naturalness.
Binaural Cue Coding (BCC) is a highly developed parametric spatial audio coding method designed for multi-channel loudspeaker systems. The BCC encodes a spatial multi-channel signal as a single (or several) downmixed audio channel and a set of perceptually relevant inter-channel differences estimated as a function of frequency and time from the original signal. The method allows for a spatial audio signal mixed for an arbitrary loudspeaker layout to be converted for any other loudspeaker layout, consisting of either same or different number of loudspeakers. The BCC also enables to convert multi-channel audio signal for headphone listening, whereby the original loudspeakers are replaced with virtual loudspeakers by employing HRTF filtering and the loudspeaker channel signals are played through HRTF filters.
The document ISO/IEC JTC 1/SC 29/WG 11/M13233, Ojala P., Jakka J. “Further information on binaural decoder functionality”, April 2006, Montreux, discloses an audio image rendering system designed for a binaural decoder, e.g. for a BCC decoder, wherein the decoder comprises a sufficient number of HRTF filter pairs to represent each possible loudspeaker position. The audio image rendering is carried out on the basis of the audio image control bit stream, which may consist of differential and absolute sound source (such as loudspeaker) locations, transmitted as side information to the decoder, according to which the HRTF filter pairs are selected. Thus, the content creator has more flexibility to design a dynamic audio image for the binaural content than for loudspeaker representation with physically fixed loudspeaker positions.
The above design offers very flexible and versatile variations for audio image rendering, provided that the decoder comprises a sufficient number of HRTF filter pairs. However, the binaural decoder standard does not mandate any particular HRTF set. Therefore, the content creation does not have any knowledge on the available HRTF filter database in the binaural decoder. Accordingly, the sound source location information carried along the audio image control bit stream may exceed or does not match exactly to the available HRTF filter set resolution in the binaural decoder. As a result, the decoder may omit the audio image control due to an incompatible HRTF filter set, whereby the perceived audio image may differ significantly from what was intended by the content creator.