The present invention relates to synthesizing a rendered output signal such as a stereo output signal or an output signal having more audio channel signals based on an available multichannel downmix and additional control data. Specifically, the multichannel downmix is a downmix of a plurality of audio object signals.
Recent development in audio facilitates the recreation of a multichannel representation of an audio signal based on a stereo (or mono) signal and corresponding control data. These parametric surround coding methods usually comprise a parameterisation. A parametric multichannel audio decoder, (e.g. the MPEG Surround decoder defined in ISO/IEC 23003-1 [1], [2]), reconstructs M channels based on K transmitted channels, where M>K, by use of the additional control data. The control data consists of a parameterisation of the multichannel signal based on IID (Inter-channel Intensity Difference) and ICC (Inter-Channel Coherence). These parameters are normally extracted in the encoding stage and describe power ratio and correlation between channel pairs used in the up-mix process. Using such a coding scheme allows for coding at a significantly significant lower data rate than transmitting all the M channels, making the coding very efficient while at the same time ensuring compatibility with both K channel devices and M channel devices.
A much related coding system is the corresponding audio object coder [3], [4] where several audio objects are down-mixed at the encoder and later upmixed, guided by control data. The process of upmixing can also be seen as a separation of the objects that are mixed in the downmix. The resulting upmixed signal can be rendered into one or more playback channels. More precisely, [3, 4] present a method to synthesize audio channels from a downmix (referred to as sum signal), statistical information about the source objects, and data that describes the desired output format. In case several downmix signals are used, these downmix signals consist of different subsets of the objects, and the upmixing is performed for each downmix channel individually.
In the case of a stereo object downmix and object rendering to stereo, or generation of a stereo signal suitable for further processing by for instance an MPEG surround decoder, it is known that a significant performance advantage is achieved by joint processing of the two channels with a time and frequency dependent matrixing scheme. Outside the scope of audio object coding, a related technique is applied for partially transforming one stereo audio signal into another stereo audio signal in WO2006/103584. It is also well known that for a general audio object coding system it is necessitated to introduce the addition of a decorrelation process to the rendering in order to perceptually reproduce the desired reference scene. However, a description of a jointly optimized combination of matrixing and decorrelation is not known. A simple combination of the conventional methods leads either to inefficient and inflexible use of the capabilities offered by a multichannel object downmix or to a poor stereo image quality in the resulting object decoder renderings.