Embodiments according to the invention are related to an apparatus, a method, and a computer program for upmixing a downmix audio signal.
Some embodiments according to the invention are related to an adaptive phase parameter smoothing for parametric multi-channel audio coding.
In the following, the context of the invention will be described. Recent development in the area of parametric audio coding delivers techniques for jointly coding a multi-channel audio (e.g. 5.1) signal into one (or more) downmix channels plus a side information stream. These techniques are known as Binaural Cue Coding, Parametric Stereo, and MPEG Surround etc.
A number of publications describe the so-called “Binaural Cue Coding” parametric multi-channel coding approach, see for example references [1][2][3][4][5].
“Parametric Stereo” is a related technique for the parametric coding of a two-channel stereo signal based on a transmitted mono signal plus parameter side information, see, for example, references [6][7].
“MPEG Surround” is an ISO standard for parametric multi-channel coding, see, for example, reference [8].
The above-mentioned techniques are based on transmitting the relevant perceptual cues for a human's spatial hearing in a compact form to the receiver together with the associated mono or stereo downmix-signal. Typical cues can be inter-channel level differences (ILD), inter-channel correlation or coherence (ICC), as well as inter-channel time differences (ITD), inter-channel phase differences (IPD), and overall phase differences (OPD).
These parameters are, in some cases, transmitted in a frequency and time resolution adapted to the human's auditory resolution.
For the transmission, the parameters are typically quantized (or, in some cases, even have to be quantized), where often (especially for low-bit rate scenarios) a rather coarse quantization is used.
The update interval in time is determined by the encoder, depending on the signal characteristics. This means that, not for every sample of the downmix-signal, parameters are transmitted. In other words, in some cases a transmission rate (or transmission frequency, or update rate) of parameters describing the above-mentioned cues may be smaller than a transmission rate (or transmission frequency, or update rate) of audio samples (or groups of audio samples).
Instead of transmitting both inter-channel phase differences (IPDs) and overall phase differences (OPDs), it is also possible to only transmit inter-channel phase differences (IPDs) and estimate the overall phase differences (OPDs) in the decoder.
Since the decoder may, in some cases, have to apply the parameters continuously over time in a gapless manner, e.g. to each sample (or audio sample), intermediate parameters may need to be derived at decoder side, typically by interpolation between past and current parameter sets.
Some conventional interpolation approaches, however, result in poor audio quality.
In the following, a generic binaural cue coding scheme will be described, taking reference to FIG. 7. FIG. 7 shows a block schematic diagram of a binaural cue coding transmission system 800, which comprises a binaural cue coding encoder 810 and a binaural cue coding decoder 820. The binaural cue coding encoder 810 may, for example, receive a plurality of audio signals 812a, 812b, and 812c. Further, the binaural cue coding encoder 810 is configured to downmix the audio input signals 812a-812c using a downmixer 814 to obtain a downmix signal 816, which may, for example, be a sum signal, and which may be designated with “AS” or “X”. Further, the binaural cue coding encoder 810 is configured to analyze the audio input signals 812a-812c using an analyzer 818 to obtain the side information signal 819 (“SI”). The sum signal 816 and the side information signal 819 are transmitted from the binaural cue coding encoder 810 to the binaural cue coding decoder 820. The binaural cue coding decoder 820 may be configured to synthesize a multi-channel audio output signal comprising, for example, audio channels y1, y2, . . . , yN on the basis of the sum signal 816 and inter-channel cues 824. For this purpose, the binaural cue coding decoder 820 may comprise a binaural cue coding synthesizer 822, which receives the sum signal 816 and the inter-channel cues 824, and provides the audio signals y1, y2, . . . , yN.
The binaural cue coding decoder 820 further comprises a side information processor 826, which is configured to receive the side information 819 and, optionally, a user input 827. The side information processor 826 is configured to provide the inter-channel cues 824 on the basis of the side information 819 and the optional user input 827.
To summarize, the audio input signals are analyzed and downmixed. The sum signal plus the side information is transmitted to the decoder. The inter-channel cues are generated from the side information and local user input. The binaural cue coding synthesis generates the multi-channel audio output signal.
For details, reference is made to the articles “Binaural Cue Coding Part II: Schemes and applications,” by C. Faller and F. Baumgarte (published in: IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, November 2003).
However, it has been found that many conventional binaural cue coding decoders provide multi-channel output audio signals with degraded quality if the side information is quantized coarsely or with insufficient resolution.
In view of this problem, there is a need for an improved concept of upmixing a downmix audio signal into an upmixed audio signal, which reduces a degradation of the hearing impression if the side information describing a phase relationship between different channels of the upmix signal is quantized with comparatively low resolution.