The present invention relates to audio signal processing, and, in particular, to format conversion of multi-channel audio signals.
Format conversion describes the process of mapping a certain number of audio channels into another representation suitable for playback via a different number of audio channels.
A common use case for format conversion is downmixing of audio channels. In Ref. [1] an example is given, wherein downmixing allows end-users to replay a version of the 5.1 source material even when a full ‘home-theatre’ 5.1 monitoring system is unavailable. Equipment designed to accept Dolby Digital material, but which provides only mono or stereo outputs (e.g. portable DVD players, set-top boxes and so forth), incorporates facilities to downmix the original 5.1 channels to the one or two output channels as standard.
On the other hand format conversion can also describe an upmix process e.g. upmixing stereo material to form a 5.1-compatible version. Also binaural rendering can be considered as format conversion.
In the following, implications of format conversion for the decoding process of compressed audio signals are discussed. Here, the compressed representation of the audio signal (mp4 file) represents a fixed number of audio channels intended for playback by a fixed loudspeaker setup.
The interaction between an audio decoder and subsequent format conversion into a desired playback format can be distinguished into three categories:
1. The decoding process is agnostic of the final playback scenario. Thus the full audio representation is retrieved and conversion processing is subsequently applied.
2. The audio decoding process is limited in its capabilities and will output a fixed format only. Examples are mono radios receiving stereo FM programs, or a mono HE-AAC decoder receiving a HE-AAC v2 bitstream.
3. The audio decoding process is aware of the final playback setup and adapts its processing accordingly. An example is the “Scalable Channel Decoding for Reduced Speaker Configurations” as defined for MPEG Surround in Ref. [2]. Here, the decoder reduces the number of output channels.
The disadvantages of these methods are unnecessary high complexity and potential artefacts by subsequent processing of decoded material (comb filtering for downmix, unmasking for upmix) (1.) and limited flexibility concerning the final output format (2. and 3.).