Digital encoding of various source signals has become increasingly important over the last decades as digital signal representation and communication increasingly has replaced analogue representation and communication. For example, distribution of media content, such as video and music, is increasingly based on digital content encoding.
Furthermore, in the last decade there has been a trend towards multi-channel audio and specifically towards spatial audio extending beyond conventional stereo signals. For example, traditional stereo recordings only comprise two channels whereas modern advanced audio systems typically use five or six channels, as in the popular 5.1 surround sound systems. This provides for a more involved listening experience where the user may be surrounded by sound sources.
Various techniques and standards have been developed for communication of such multi-channel signals. For example, six discrete channels representing a 5.1 surround system may be transmitted in accordance with standards such as the Advanced Audio Coding (AAC) or Dolby Digital standards.
However, in order to provide backwards compatibility, it is known to down-mix the higher number of channels to a lower number and specifically it is frequently used to down-mix a 5.1 surround sound signal to a stereo signal allowing a stereo signal to be reproduced by legacy (stereo) decoders and a 5.1 signal by surround sound decoders.
One example is the MPEG2 backwards compatible coding method. A multi-channel signal is down-mixed into a stereo signal. Additional signals are encoded in the ancillary data portion allowing an MPEG2 multi-channel decoder to generate a representation of the multi-channel signal. An MPEG1 decoder will disregard the ancillary data and thus only decode the stereo down-mix. The main disadvantage of the coding method applied in MPEG2 is that the additional data rate required for the additional signals is in the same order of magnitude as the data rate required for coding the stereo signal. The additional bit rate for extending stereo to multi-channel audio is therefore significant.
Other existing methods for backwards-compatible multi-channel transmission without additional multi-channel information can typically be characterized as matrixed-surround methods. Examples of matrix surround sound encoding include methods such as Dolby Prologic II and Logic-7. The common principle of these methods is that they matrix-multiply the multiple channels of the input signal by a suitable non-quadratic matrix thereby generating an output signal with a lower number of channels. Specifically, a matrix encoder typically applies phase shifts to the surround channels prior to mixing them with the front and center channels.
Another reason for a channel conversion is coding efficiency. It has been found that e.g. surround sound audio signals can be encoded as stereo channel audio signals combined with a parameter bit stream describing the spatial properties of the audio signal. The decoder can reproduce the stereo audio signals with a very satisfactory degree of accuracy. In this way, substantial bit rate savings may be obtained.
Thus, in (parametric) spatial audio (en)coders, parameters are extracted from the original audio signal so as to produce an audio signal having a reduced number of channels, for example only a single channel, plus a set of parameters describing the spatial properties of the original audio signal. In (parametric) spatial audio decoders, the spatial properties described by the transmitted spatial parameters are used to recreate the original spatial multi-channel signal. There are several parameters which may be used to describe the spatial properties of audio signals. One such parameter is the inter-channel cross-correlation, such as the cross-correlation between the left channel and the right channel for stereo signals. Another parameter is the power ratio of the channels.
A specific example of such a technique is the MPEG Surround approach for efficiently coding multi-channel audio signals.
An MPEG Surround encoder down-mixes an M channel input signal to an N channel down-mix signal where N<M, and extracts the spatial parameters. The down-mix signal is typically encoded using a legacy encoder, such as e.g. an MP3 or AAC encoder. The spatial parameters are encoded and embedded into the bit-stream in a backward compatible way such that legacy decoders can still decode the underlying down-mix signal.
In the MPEG Surround decoder, the down-mix signal is first decoded using a legacy decoder. The multi-channel signal is then reconstructed by means of the spatial parameters that are extracted from the bit-stream.
Apart from the typical multi-channel coding as described above, MPEG Surround offers a rich set of additional features, e.g.:
Non-guided decoding—the MPEG Surround decoder is able to create a multi-channel up-mix of stereo signals when the spatial side information described above is not available. In this mode, the decoder calculates the power ratio and correlation of the stereo signal and these characteristics are used to obtain the required spatial parameters by table lookup.
Matrix Compatibility—the MPEG Surround encoder is able to generate a down-mix that can be decoded using existing matrix decoding schemes. The matrix surround down-mix is created such that it can be inverted by an MPEG Surround decoder without perceptual concessions to the decoder performance. Furthermore, matrix surround down-mixes improve the performance of the non-guided mode.
Binaural decoding—the MPEG Surround decoder is able to transform a mono or stereo down-mix signal directly into a 3D binaural stereo signal using the spatial parameters instead of calculating a multi-channel signal as an intermediate step.
Artistic down-mix—MPEG Surround allows transmission of a manually created down-mix instead of the automated MPEG Surround down-mix.
Arbitrary trees—the MPEG Surround bitstream supports definition of arbitrary up-mix structures allowing an arbitrary number of output channels.
The MPEG Surround coder aims at representing the original multi-channel signal as accurately as possible for a predefined speaker setup, such as e.g. a 5.1 setup. However, it does not allow any flexibility with regard to different listening positions and environments such as typically present at home or in a vehicle.
Reproduction for alternative listening positions and environments can be improved by manipulation of the sweet-spot (e.g. moving and/or widening). However, although sweet-spot manipulation is known, conventional approaches tend to be suboptimal and are generally applied as a post-processing step requiring high complexity processing of the individual output channels.
Hence, an improved system for manipulating a sweet-spot would be advantageous and in particular a system allowing increased flexibility, improved quality, improved listening experiences, reduced complexity, facilitated processing and/or improved performance would be advantageous.