In spatial audio coding, a two/multi-channel audio signal is processed such that the audio signals to be reproduced on different audio channels differ from one another, thereby providing the listeners with an impression of a spatial effect around the audio source. The spatial effect can be created by recording the audio directly into suitable formats for multi-channel or binaural reproduction, or the spatial effect can be created artificially in any two/multi-channel audio signal, which is known as spatialization.
It is generally known that for headphones reproduction artificial spatialization can be performed by HRTF (Head Related Transfer Function) filtering, which produces binaural signals for the listener's left and right ear. Sound source signals are filtered with filters derived from the HRTFs corresponding to their direction of origin. A HRTF is the transfer function measured from a sound source in free field to the ear of a human or an artificial head, divided by the transfer function to a microphone replacing the head and placed in the middle of the head. Artificial room effect (e.g. early reflections and/or late reverberation) can be added to the spatialized signals to improve source externalization and naturalness.
As the variety of audio listening and interaction devices increases, compatibility becomes more important. Amongst spatial audio formats the compatibility is striven through upmix and downmix techniques. It is generally known that there are algorithms for converting multi-channel audio signal into stereo format, such as Dolby Digital® and Dolby Surround®, and for further converting stereo signal into binaural signal. However, in this kind of processing the spatial image of the original multi-channel audio signal cannot be fully reproduced. A better way of converting multi-channel audio signal for headphone listening is to replace the original loudspeakers with virtual loudspeakers by employing HRTF filtering and to play the loudspeaker channel signals through those (e.g. Dolby Headphone®). However, this process has the disadvantage that, for generating a binaural signal, a multi-channel mix is always first needed. That is, the multi-channel (e.g. 5+1 channels) signals are first decoded and synthesized, and HRTFs are then applied to each signal for forming a binaural signal. This is computationally a heavy approach compared to decoding directly from the compressed multi-channel format into binaural format.
Binaural Cue Coding (BCC) is a highly developed parametric spatial audio coding method. BCC represents a spatial multi-channel signal as a single (or several) downmixed audio channel and a set of perceptually relevant inter-channel differences estimated as a function of frequency and time from the original signal. The method allows for a spatial audio signal mixed for an arbitrary loudspeaker layout to be converted for any other loudspeaker layout, consisting of either same or different number of loudspeakers.
Accordingly, the BCC is designed for multi-channel loudspeaker systems. The original loudspeaker layout determines the content of the encoder output, i.e. the BCC processed mono signal and its side information, and the loudspeaker layout of the decoder unit determines how this information is converted for reproduction. When reproduced for spatial headphones playback, the original loudspeaker layout dictates the sound source locations of the binaural signal to be generated. Thus, even though a spatial binaural signal, as such, would allow for a flexible alternation of sound source locations, the loudspeaker layout of a binaural signal generated from the conventionally encoded BCC signal is fixed to the sound source locations of the original multi-channel signal. This limits the application of enhanced spatial effects.