Parametric coding of multi-channel audio signals is an ongoing topic of research. Generally two approaches to encode multi-channel audio signals can be distinguished. The Moving Pictures Experts Group (MPEG), a subgroup of the International Organization for Standardization (ISO), is currently working on the standardization of technology for the reconstruction of multi-channel audio content from stereo or even mono down-mix signals by adding only a small amount of helper information to the down-mix signals.
In parallel stereo to multi-channel up-mix methods are being developed which do not need any additional side-information that is not already (implicitly) contained in the down-mix signal in order to reconstruct the spatial image of the original multi-channel audio signal.
Existing methods for stereo-compatible multi-channel transmission without additional side-information that gained practical relevance can mostly be characterized as matrixed-surround methods, such as Dolby Pro Logic (Dolby Pro Logic II) and Logic-7, as described in more detail in “Dolby Surround Pro Logic II Decoder—Principles of Operation”, http://www.dolby.com/assets/pdf/tech_library/209_Dolby_Surround_Pro_Logic_II_Decoder_Principles_of_Operation.pdf and in “Multichannel Matrix Surround Decoders for Two-Eared Listeners”, Griesinger, D., 101st AES Convention, Los Angeles, USA, 1996, Preprint 4402. The common principle of these methods is that they make use of dedicated ways of multi-channel or stereo down-mixing where the encoder applies phase shifts to the surround channels prior to mixing them together with front and centre channels to form a stereo down-mix signal. The generation of the down-mix signal (Lt, Rt) is depicted in the following equation:
                              [                                                    Lt                                                                    Rt                                              ]                =                              [                                                            1                                                  0                                                  q                                                                      a                    ·                    j                                                                                        b                    ·                    j                                                                                                0                                                  1                                                  q                                                                                            -                      b                                        ·                    j                                                                                                              -                      a                                        ·                    j                                                                        ]                    ⁡                      [                                                            Lf                                                                              Rf                                                                              C                                                                              Ls                                                                              Rs                                                      ]                                              (        1        )            
The left down-mix signal (Lt) consists of the left-front signal (Lf), the centre signal (C) multiplied by a factor q, the left-surround signal (Ls) phase rotated by 90 degrees (,j′) and scaled by a factor a, and the right-surround signal (Rs) which is also phase rotated by 90 degrees and scaled by a factor b. The right down-mix signal (Rt) is generated similarly. Typical down-mix factors are 0.707 for q and a, and 0.408 for b. The rationale for the different signs of the surround channels for the right down-mix signal (Rt) and the left down-mix signal (Lt) is, that it is advantageous to mix the surround channels in anti-phase in the down-mix pair (Lt, Rt). This property helps the decoder to discriminate between front and rear channels from the down-mix signal pair. Hence the down-mix matrix allows for a partial reconstruction of a multi-channel output signal out from the stereo down-mix within the decoder by applying a de-matrixing operation. How close the re-created multi-channel signal resembles the original encoder input signal, however, depends on the specific properties of the multi-channel audio content.
An example for a coding method adding helper information, also called side information, is MPEG Surround audio coding. This efficient way for parametric multi-channel audio coding is for example described in “The Reference Model Architecture for MPEG Spatial Audio Coding”, Herre, J., Purnhagen, H., Breebaart, J., Faller, C., Disch, S., Kjoerling, K., Schuijers, E., Hilpert, J., Myburg, F., Proc. 118th AES Convention, Barcelona, Spain, 2005 and in “Text of Working Draft for Spatial Audio Coding (SAC)”, ISO/IEC JTC1/SC29/WG11 (MPEG), Document N7136, Busan, Korea, 2005.
A schematic overview of an encoder used in spatial audio coding is shown in FIG. 6. The encoder splits incoming signals 10 (input 1, . . . input N) in separate time-frequency tiles by means of Quadrature Mirror Filters 12 (QMF). Groups of the resulting frequency tiles (bands) are referred to as “parameter bands”. For every parameter band, a number of spatial parameters 14 are determined by a parameter estimator 16 that describes the properties of the spatial image, e.g. level differences between pairs of channels (CLD), cross correlation between pairs of channels (ICC) or information on signal envelopes (CPC). These parameters are subsequently quantized, encoded and compiled jointly into a bit-stream of spatial data. Depending on the operation mode, this bit-stream can cover a wide range of bit-rates, starting from a few kBit/s for good quality multi-channel audio up to tenths of kBit/s for near-transparent quality.
Besides the extraction of parameters, the encoder also generates a mono or stereo down-mix from the multi-channel input signal. Moreover, in case of a stereo down-mix, the user has the choice of a conventional (ITU-style) stereo down-mix or of a down-mix that is compatible with matrixed-surround systems. Finally, the stereo down-mix is transferred to the time-domain by means of QMF synthesis banks 18. The resulting down-mix can be transmitted to a decoder, accompanied by the spatial parameters or the spatial parameter bit-stream 14. Preferably, the down-mix is also encoded before transmission (using a conventional mono or stereo core coder), while the bit-streams of the core coder and the spatial parameters might additionally be combined (multiplexed) to form a single output bit-stream.
A decoder, as sketched in FIG. 7, in principle performs the reverse process of the encoder. An input-stream is split into a core coder bit-stream and a parameter bit-stream. This is not shown in FIG. 7. Subsequently, the decoded down-mix 20 is processed by a QMF analysis bank 22 to derive parameter bands that are the same as those applied in the encoder. A spatial synthesis stage 24 reconstructs the multi-channel signal by means of control data 26 (i.e., the transmitted spatial parameters). Finally, the QMF-domain signals are transferred to the time domain by means of a QMF synthesis bank 27 that derives the final multi-channel output signals 28.
FIG. 8 shows a simple example of a QMF analysis, as it is performed within the prior art encoder in FIG. 6 and the prior art decoder in FIG. 7. An audio sample 30, sampled in the time domain and having four sample values is input into a filter bank 32. The filter bank 32 derives three output samples 34a, 34b and 34c having four sample values each. In an ideal case, the filter bank 32 derives the output samples 34a to 34c such that the samples within the output signals do only comprise information on discrete frequency ranges of the underlying audio signal 30. In the case shown in FIG. 8, the sample 34a has information on the frequency interval ranging from f0 to f1, the sample 34b has information of the frequency interval [f1, f2] and the sample 34c has information on the frequency interval [f2, f3]. Although the frequency intervals in FIG. 8 do not overlap, in a more general case the frequency intervals of the output samples coming out of a filter bank may very well have a frequency overlap.
A prior art encoder can, as already described above, deliver either an ITU-style down-mix or a matrixed-surround compatible down-mix, when a two-channel down-mix is desired. In the case of a matrixed-surround compatible down-mix (using for example the matrixing approach given in Equation 1), one possibility would be that the encoder generates a matrixed-surround compatible down-mix directly.
FIG. 9 shows an alternative approach to generate a matrixed-surround compatible down-mix using a down-mix post processing unit 30 working on a regular stereo down-mix 32. The matrixed-surround processor 30 (MTX encoder) modifies the regular stereo down-mix 32 to make it matrixed-surround compatible guided by the spatial parameters 14 extracted by the parameter extraction stage 16. For transmission, a matrixed-surround compatible down-mix 34 is transferred to the time domain by a QMF synthesis using the QMF synthesis bank 18.
Deriving the matrixed-surround compatible signal by post-processing a regular stereo down-mix has the advantage that the matrixed-surround compatibility processing can be fully reversed at a decoder side if the spatial parameters are available.
Although both of the approaches are suited to transmit a multi-channel signal, there are specific drawbacks of state of the art systems. Matrixed-surround methods are very efficient (since no additional parameters are required) at the price of a very limited multi-channel reconstruction quality.
Parametric multi-channel approaches on the other hand require a higher bit-rate due to the side information, which becomes a problem when a limit is set as a maximum acceptable bit-rate for the parametric representation. When the encoded parameters require a comparatively high amount of bit-rate, the only possible way to stay within such a bit-rate limit is to decrease the quality of an encoded down-mix channel by increasing the compression of the channel. Hence, the result is a general loss in audio quality, which may be unacceptably high. In other words, for parametric multi-channel approaches, there is often a hard limit of the minimum bit-rate that is required for the spatial parameter layer, which may in some cases be unacceptably high.
Although principle backwards compatibility between matrixed-surround methods and spatial audio methods can be achieved by a prior art encoder as illustrated in FIG. 9, no additional bit-rate can be saved with this approach when only matrix-based decoding is required. Even then the full set of spatial parameters has to be transmitted, wasting transmission bandwidth.
Whereas the bit-rate that has to be spent when applying the parametric method may be too high in case of certain application scenarios, the audio quality delivered by the methods without transmission of side-information might not be sufficient.
The US Patent Application 2005157883 is showing an apparatus for constructing a multi-channel audio signal using an input signal and parametric side information, the input signal including the first input channel and the second input channel derived from an original multi-channel signal, and the parametric side information describing interrelations between channels of the multi-channel original signal.