Digital encoding of various source signals has become increasingly important over the last decades as digital signal representation and communication increasingly has replaced analogue representation and communication. For example, distribution of media content, such as video and music, is increasingly based on digital content encoding.
Encoding of multi-channel signals may be performed by down-mixing of the multi-channel signal to fewer channels and the encoding and transmission of these. For example, a stereo signal may be down-mixed to a mono signal which is then encoded. In parametric multi-channel encoding, parametric data is furthermore generated which supports an up-mixing of the down-mix to recreate (approximations) of the original multi-channel signal. Examples of multi-channel systems that use down-mixing/up-mixing and associated parametric data include the technique known as Parametric Stereo (PS) standard and its extension to multi-channel parametric coding (e.g., MPEG Surround: MPS).
In its simplest form, the down-mixing of a stereo signal to a mono signal may simply be performed by generating the average of the two stereo channels i.e. by simply generating the mid or sum signal. This mono signal may then be distributed and may further be used directly as a mono-signal. In encoding approaches such as used by Parametric stereo, stereo cues are provided in addition to the down-mix signal. Specifically, inter-channel level differences, time- or phase-differences and coherence or correlation parameters are determined per time-frequency tile (which typically corresponds to a Bark or ERB band division of the frequency axis and a fixed uniform segmentation of the time axis). This data is typically distributed together with the down-mix signal and allows an accurate recreation of the original stereo signal to be made by an up-mixing which is dependent on the parameters.
However, it is well-known that creating the mid signal typically results in somewhat dull signals, i.e., with reduced brightness/high-frequency content. The reason is that for typical audio signals, the different channels tend to be fairly correlated for low-frequencies but not for higher frequencies. Direct summation of the two stereo channels effectively suppresses the non-aligned signal components. Indeed, for frequency subbands wherein the left and right signals are completely out of phase, the resulting mid signal is zero.
A solution which has been proposed is to use phase alignment of the channels before the summation is performed. Thus, ideally the left and right signals are compensated for any phase difference in the frequency domain (corresponding to time difference in the time domain) before being added together. However, such an approach tends to be complex and may introduce an algorithmic delay. Also, in practice, the approach tends to not provide optimal quality. E.g. if the inter-channel phase-difference is measured, there is an ambiguity in whether to align the phase of the left channel to the right channel or vice versa. Also trying to shift the phase of both channels equally leads to ambiguity. Further, the phase difference is numerically ill-conditioned when the correlation is low thereby resulting in a less accurate and robust system. Overall these issues tend to lead to perceptible artifacts when creating a down-mix by phase-alignment. Typically, modulations on tonal components result from the approach.
As a consequence most practical systems tend to use a so-called passive down-mix generated simply as the mean of the left and right signals. Unfortunately, the passive down-mixing also has some associated disadvantages. One of these is that the acoustic energy can be substantially reduced and even completely lost for out of phase signals. A proposed method for addressing this is to use a so called active down-mixing where the down-mix is rescaled to have the same energy as the original signals. Another proposed solution is to provide a decoder-side energy compensation. However, such compensations tend to be on a rather global level and do not discriminate between tonal components (where compensation is necessary) and noise (where it is not). Furthermore, in both passive and active down-mix approaches, problems occur for signals that approach being out of phase. Indeed, out-of-phase components are completely absent in the down-mix signal.
Hence, an improved system for multi-channel parametric encoding/decoding would be advantageous and in particular a system allowing increased flexibility, facilitated operation, facilitated implementation, reduced complexity, improved robustness, improved encoding of out of phase signal components, reduced data rate versus quality ratio and/or improved performance would be advantageous.