Digital encoding of various audio signals has become increasingly important over the last decades as digital signal representation and communication increasingly has replaced analogue representation and communication.
In the last decade there has been a trend towards multi-channel audio and specifically towards spatial audio extending beyond conventional stereo signals. For example, traditional stereo recordings only comprise two channels whereas modern advanced audio systems typically use five or six channels, as in the popular 5.1 surround sound systems. This provides for a more involved listening experience where the user may be surrounded by sound sources.
Various techniques and standards have been developed for communication of such multi-channel signals. For example, six discrete channels representing a 5.1 surround system may be transmitted in accordance with standards such as the Advanced Audio Coding (AAC) or Dolby Digital standards.
However, in order to provide backwards compatibility, it is known to down-mix the higher number of channels to a lower number and specifically it is frequently used to down-mix a 5.1 surround sound signal to a stereo signal allowing a stereo signal to be reproduced by legacy (stereo) decoders and a 5.1 signal by surround sound decoders.
One example is the MPEG Surround backwards compatible coding method standardized by the Moving Pictures Experts Group (MPEG). In such a system, a multi-channel signal is down-mixed into a stereo signal and the additional signals are encoded by parametric data in the ancillary data portion allowing an MPEG Surround multi-channel decoder to generate a representation of the multi-channel signal. A legacy mono or stereo decoder will disregard the ancillary data and thus only decode the mono or stereo down-mix.
Thus, in (parametric) spatial audio (en)coders, parameters are extracted from the original audio signal so as to produce an audio signal having a reduced number of channels, for example only a single channel, plus a set of parameters describing the spatial properties of the original audio signal. In (parametric) spatial audio decoders, the spatial properties described by the transmitted spatial parameters are used to recreate the original spatial multi-channel signal.
Recently, techniques for distribution of individual audio objects which can be processed and manipulated at the receiving end have attracted significant interest. For example, within the MPEG framework, a work item is started on object-based spatial audio coding. The aim of this work item is to explore new technology and reuse of current MPEG Surround components and technologies for the bit rate efficient coding of multiple sound sources or objects into a number of down-mix channels and corresponding spatial parameters. Thus, the intention is to use similar techniques as used for down-mixing of spatial (surround) channels to fewer channels to down-mix independent audio objects into a smaller number of channels.
In object oriented audio systems, the decoder can provide discrete positioning of these sources/objects and adaptation to various loudspeaker setups as well as binaural rendering. Additionally, user interaction can be used to control repositioning/panning of the individual sources on the reproduction side.
In other words, the aim of the research is to encode multiple audio objects in a limited set of down-mix channels accompanied by parameters. At the decoder side, users can interact with the content for example by repositioning the individual objects. As a specific example, a number of individual instruments may be encoded and distributed as audio objects thereby allowing a user receiving the encoded data to independently position the individual instruments in the sound image.
FIG. 1 illustrates an example of an object oriented audio encoder and decoder in accordance with the prior art. In the example, a set of audio objects (O1 to O4) are encoded in an object-oriented encoder 101 which generates a down-mix signal and object parameters. These are transmitted to the object oriented decoder 103 which generates approximate copies of the audio object signals using the transmitted object parameters.
Subsequently, a rendering element 105 generates the output signal having the desired characteristics. For example, the rendering element 105 can position the objects at sound source positions indicated by the user, for example using a panning law. The output signal configuration is flexible. For example, if the output signal is mono, the user can still manipulate the relative loudness/volume of each object. In a stereo output signal configuration, a simple panning law can be applied in order to position each object at a desired position. Obviously, for a multi-channel output configuration, the flexibility is even larger.
However, although the system can provide advantageous performance, it also has a number of disadvantages. For example, in many cases the reproduced quality is suboptimal and a completely free and independent manipulation of the individual audio objects is not possible. Specifically, the down-mix of the encoder is generally not completely reversible at the decoder which accordingly can only generate approximations of the original audio objects. Thus, the decoder is not able to fully reconstruct the individual object signals but can only estimate these according to perceptual criteria. This specifically results in cross-interference (crosstalk) between audio objects thereby resulting in the audio objects no longer being completely independent. As a result manipulations on one audio object affect the characteristics and perception of another object.
For example, one of the most important parameters that users typically would like to adjust is the relative volume of each audio object. However, if large volume adjustments are made this will result in considerable artifacts and undesirable crosstalk resulting in noticeable quality degradation.
Hence, an improved system for audio object encoding/decoding would be advantageous and in particular a system allowing increased flexibility, improved quality, facilitated implementation and/or improved performance would be advantageous.